A first look at Illumina’s new NextSeq 500

Elsie replied

03-30-2015, 12:50 PM
Hi Brian,
My plots look similar, slightly better than yours, but I have decided to postpone posting them as we are moving to V2 chemistry in a few weeks. I'll redo the plots then and post them. Thanks again for all your help, it is greatly appreciated.
Leave a comment:
Elsie replied

03-09-2015, 02:50 PM
Thanks Brian. BBmap is currently running, I'll post the results. By the way, I always want to type bbduck...
thanks.
Leave a comment:
Brian Bushnell replied

03-09-2015, 02:49 PM
I'd be interested in seeing the stderr log of BBDuk... it's implausible that ALL of your reads are 100% adapter.
Leave a comment:
Elsie replied

03-09-2015, 01:07 PM
Hi GenoMax,

Your dumb question was, unfortunately, spot on - interleaved is fine but trimmed is tiny - hard to map when there is nothing to map! I'll double check this, thank you.
Leave a comment:
GenoMax replied

03-09-2015, 11:39 AM
@Elsie: This may sound like a dumb question but have you made sure that your interleaved.fq.gz and trimmed.fq.gz files have contents (i.e. they are non-zero bytes in size).

Post 3-4 sequences from your interleaved and trimmed.fq.gz files (use zmore or zless).
Leave a comment:
Elsie replied

03-09-2015, 11:32 AM
Hi Brian, hi Genomax,

thanks for the replies.
Brian - that was just a typo by me, it was the correct file.
Genomax - I am after stats.
I'm just on my way into work, 6.30am here, I'll log on again there and have another look at my commands.
thanks.
Leave a comment:
Brian Bushnell replied

03-09-2015, 09:05 AM
I think I see the problem - you're using the wrong file name for the reads:

bbduk.sh in=interleaved.gz out=trimmed.fq.gz ktrim=r k=23 hdist=1 mink=11 tpe tbo minlen=90 ref=truseq.fa.gz,nextera.fa.gz

bbmap.sh maxindex=200 in=trimmed.fq.fq mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt

...

Reads Used: 0 (0 bases)

Normally, BBMap should throw an exception saying it can't find the input file if it does not exist, so I assume there is an empty file named "trimmed.fq.fq".
Leave a comment:
GenoMax replied

03-09-2015, 03:50 AM
@Elsie: Are you trying to analyze NextSeq500 data or just creating stats? This thread was originally about quality of NextSeq500 reads and the procedure that Brian had posted was to create stats files (not actual alignments).

If you are actually trying to analyze real data then there is no need to create interleaved data sets. You can directly trim and then align R1/R2 reads against human genome. You need to specify an output file to store the aligned reads.

A minimal command line for doing the mapping would be following. More examples in the BBMap thread: http://seqanswers.com/forums/showthread.php?t=41057

Code:

$ bbmap.sh -Xmx30g in=trimmedfq.gz path=/path_to_BBMap_index_top_folder_with_ref_directory/ out=sample_ID.sam qin=33

Change the path to BBMap index according to your local path. Add additional options (there are plenty) as needed depending on kind of experiment you are analyzing.
Leave a comment:
Elsie replied

03-09-2015, 01:54 AM
No error, just Nas which is why I think I am missing something:
reformat.sh in1=R1.fastq in2=R2.fastq out=interleaved
gzip interleaved
bbmap.sh ref=hg19.fa
bbduk.sh in=interleaved.gz out=trimmed.fq.gz ktrim=r k=23 hdist=1 mink=11 tpe tbo minlen=90 ref=truseq.fa.gz,nextera.fa.gz
bbmap.sh maxindex=200 in=trimmed.fq.fq mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt

BBMap version 34.56
Set match histogram output to mhist.txt
Set base content histogram output to bhist.txt
Set quality histogram output to qhist.txt
Set quality accuracy histogram output to qahist.txt
Retaining first best site only for ambiguous mappings.
No output file.
Set genome to 1

Loaded Reference: 5.025 seconds.
Loading index for chunk 1-7, build 1
Generated Index: 6.192 seconds.
Analyzed Index: 7.512 seconds.
Cleared Memory: 0.461 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 16 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

------------------ Results ------------------

Genome: 1
Key Length: 13
Max Indel: 200
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 0 (0 bases)

Mapping: 0.447 seconds.
Reads/sec: 0.00
kBases/sec: 0.00

Read 1 data: pct reads num reads pct bases num bases

mapped: NaN% 0 NaN% 0
unambiguous: NaN% 0 NaN% 0
ambiguous: NaN% 0 NaN% 0
low-Q discards: NaN% 0 NaN% 0

perfect best site: NaN% 0 NaN% 0
semiperfect site: NaN% 0 NaN% 0

Match Rate: NA NA NaN% 0
Error Rate: NaN% 0 NaN% 0
Sub Rate: NaN% 0 NaN% 0
Del Rate: NaN% 0 NaN% 0
Ins Rate: NaN% 0 NaN% 0
N Rate: NaN% 0 NaN% 0

Total time: 19.975 seconds.

Any advice greatly appreciated. thanks.
Leave a comment:
aeonsim replied

03-08-2015, 04:09 AM
Originally posted by fanli View Post

Hi all,

Just wanted to add our data as well - this was from an RNA-seq library and I don't have paired HiSeq data. Still, you can see some ambitious reporting of quality scores albeit not nearly as bad as what aeonsim showed.

We're hopefully going to run a v2 kit soon and I'll update with those stats when I get them!

I'd acctually say they're worse than what we had, considering your using PE80bp and the first 10 or so bases on the forward reads shows an average Quality score drop of ~10 on the Phred Scale (~30 to 20).

However our conculsion from our testing is that the NextSeq with V1 chemistry is ok for RNAseq as the reads still map fine and the coverage is high, it's however not suitable for variant calling especially when one is interested in de novo variants or low coverage. As a result it's only being used internally for RNAseq currently.

We will aparently get access to the V2 kits as soon as they're available to see if that fixes the issue.

Last edited by aeonsim; 03-08-2015, 04:26 AM.
Leave a comment:
Brian Bushnell replied

03-06-2015, 10:01 AM
Originally posted by GenoMax View Post

The index should be ok. I think Brian is concatenating all chromosomes and then creating the index so that file is not a literal equivalent of human/mouse genome (file I have looks similar to yours).

That's correct. They're called chromosomes for legacy reasons (the chunks used to be one real chromosome each) but it's more efficient to pack them.
Leave a comment:
fanli replied

03-06-2015, 07:51 AM
Hi all,

Just wanted to add our data as well - this was from an RNA-seq library and I don't have paired HiSeq data. Still, you can see some ambitious reporting of quality scores albeit not nearly as bad as what aeonsim showed.

We're hopefully going to run a v2 kit soon and I'll update with those stats when I get them!
Attached Files

recalibration_plots.pdf (165.7 KB, 119 views)
Leave a comment:
GenoMax replied

03-06-2015, 04:25 AM
Originally posted by Elsie View Post

Hi Brian,
sorry, still having issues. I've now switched to some NextSeq data generated with mouse and human data. I'm getting Nas in my histograms, I think there is something wrong with my index

What is Nas? (N's?)

The index should be ok. I think Brian is concatenating all chromosomes and then creating the index so that file is not a literal equivalent of human/mouse genome (file I have looks similar to yours).

Are you getting an error when you do the mapping?
Leave a comment:
Elsie replied

03-06-2015, 03:10 AM
Hi Brian,
sorry, still having issues. I've now switched to some NextSeq data generated with mouse and human data. I'm getting Nas in my histograms, I think there is something wrong with my index, info.txt gives me:
#Chromosome sizes
#Generated on Fri Mar 06 21:58:51 EST 2015
#Version 5
#chrom scaffolds contigs length defined undefined startPad stopPad
1 4 41 493337098 479857220 13479878 8000 8000
2 5 61 512852808 496275279 16577529 8000 8000
3 3 70 439025362 428441699 10583663 8000 8000
4 3 78 468391081 456374140 12016941 8000 8000
5 3 46 424587819 413803382 10784437 8000 8000
6 4 155 387396307 372786010 14610297 8000 8000
What happened to the other chromosomes? I must be doing something wrong but I am just doing the bbmap ref command as indicated previously.
thanks.
Leave a comment:
Brian Bushnell replied

03-03-2015, 06:00 PM
You're welcome - let us know if you discover anything interesting!
Leave a comment:

Previous 1 2 3 4 5 6 7 8 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News