Seqanswers Leaderboard Ad

**nucacidhunter** · 10-07-2014, 02:49 PM

Thanks Brian for posting your analysis results. I wonder if HiSeq reads are also from bacterial DNA library and prepared using the same protocol as NextSeq ones.

**Brian Bushnell** · 10-07-2014, 02:54 PM

The HiSeq reads are bacterial, but from a collection of 26 different isolates mixed together to form a synthetic metagenomic community. I don't know much about the preparation protocols, but certainly the insert sizes differ substantially, so at least size selection was probably different; maybe shearing too.

**colindaven** · 10-09-2014, 06:29 AM

Interesting, thanks very much for the detailed analysis and your thoughts. So the data looks a little worse than HiSeq, I agree, but they're at an early stage with the NextSeq chemistry. Far more serious would be the use of low quality optics, which would be understandable at that price point.

Any thoughts or observations on de novo assembly or SNP calling ? I believe I saw a post on SeqAnswers saying SNP calling works fine on the NextSeq at the expense of a few more indel errors (compared to HiSeq data).

We are interested in a direct comparison against the Ion Proton. I see these details indicate the indel error rate is a lot lower here than that what I've heard comes off the Proton. This is very important for getting good de novo assemblies of course.

Thanks again.

**rocksd** · 07-22-2015, 08:39 AM

Originally posted by Brian Bushnell View Post

We recently acquired a NextSeq machine and are not very impressed with the data. I've uploaded a spreadsheet containing some of the statistics here:

https://drive.google.com/file/d/0B3llHR93L14wX2cyNmRRckh6V1k/view?usp=sharing

The first tab is a HiSeq2000 2x150bp run. The insert size was below target, so I adapter-trimmed adapters before analyzing the data (no other preprocessing was run); and the HS2000 is not really spec'd to 2x150, so as you might imagine, the quality suffers toward the end. Regardless, it's pretty good. Looking at the mapping stats, 99.55% of the reads mapped, and overall 79.85% of the reads were error-free.

The next two tabs contain a couple of lanes of NextSeq bacterial sequence. Lane 1 generally seems to be the best, with quality dropping to a minimum at lane 4. But even for lane 1, only 96.47% of the reads mapped and 49.3% were perfect matches; by lane 4, 95.91% mapped and 38.91% were perfect. So the rate of reads with errors roughly tripled from HS2000 (which does not support 2x150bp runs) to NextSeq (which supposedly does), and as you can see on the "Average Quality by Position" and "Error Rate vs Read Position" graphs, the comparison would be brutal - an order of magnitude or more - if you consider 2x100bp reads. Also, if you look at the "Quality Score Accuracy" graph, the HS2000 quality scores are fairly accurate and typically underestimate quality, while the NextSeq ones are inaccurate and overestimate quality by about 10 dB (and are quantized), so you can't easily quality-trim the NextSeq data to improve it.

The "Library Uniqueness" graph, generated by sampling a kmer from each read and hashing it to see if it was seen before, is also very odd for NextSeq. It is wavy. The graph should monotonically decrease and any increase indicates a sudden error burst. So it seems maybe the period (~625000 reads) corresponds with an image frame, the clusters around the edges of the frame are blurry, as one might expect from low-quality or miscalibrated optics.

The Base Frequency vs Position graph is also interesting - NextSeq has a clear A/T ratio bias that is not present in HS data. The 3bp-wavelength sawtooth pattern probably has something to do with codon structure.

Does anyone else have data they'd like to share on NextSeq machines?

P.S. Command lines I used:

Code:

bbcountunique.sh in=reads.fq.gz reads=100000000 out=uniqueness.txt

bbduk.sh in=reads.fq.gz reads=4000000 ktrim=r k=25 hdist=1 mink=12 tbo tpe ref=nextera.fa,truseq.fa out=ktrimmed.fq.gz ow

bbmap.sh in=ktrimmed.fq.gz reads=4000000 mhist=mhist.txt ihist=ihist.txt bhist=bhist.txt idhist=idhist.txt ehist=ehist.txt qhist=qhist.txt idbins=200 qahist=qahist.txt aqhist=aqhist.txt indelhist=indelhist.txt gchist=gchist.txt

bbmerge.sh in=ktrimmed.fq.gz reads=4000000 ihist=ihist_merge.txt

Hi Brian,

We are looking to purchasing a NextSeq. But we do have a concern regarding the quality of the reads generated on NextSeq. Do you have a better experience now with the NextSeq?

Your input is highly appreciated.

James

**Brian Bushnell** · 07-22-2015, 09:19 AM

V2 chemistry has substantially higher quality than V1; it's basically fine. However, it still has some issues with the barcode-reading cycles, which has caused problems with multiplexed runs; we've had some in which certain barcodes are misread ~95% of the time, and thus get demultiplexed into the unknown bin. Last I heard, Illumina was aware of this issue and working on it; not sure what the current status is.

**rocksd** · 07-22-2015, 01:45 PM

Originally posted by Brian Bushnell View Post

V2 chemistry has substantially higher quality than V1; it's basically fine. However, it still has some issues with the barcode-reading cycles, which has caused problems with multiplexed runs; we've had some in which certain barcodes are misread ~95% of the time, and thus get demultiplexed into the unknown bin. Last I heard, Illumina was aware of this issue and working on it; not sure what the current status is.

Brian,

Thanks for your reply. Are those bar-codes (that were misread) from Illumina or are they custom ones that prepared by you or your end-user?

Thanks

James

**Brian Bushnell** · 07-22-2015, 01:48 PM

I think they were Illumina TruSeq, but it's possible they were custom. They worked fine on HiSeq and MiSeq, though, and on NextSeq with V1 chemistry.

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

NextSeq Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News