Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    NextSeq Data

    We recently acquired a NextSeq machine and are not very impressed with the data. I've uploaded a spreadsheet containing some of the statistics here:



    The first tab is a HiSeq2000 2x150bp run. The insert size was below target, so I adapter-trimmed adapters before analyzing the data (no other preprocessing was run); and the HS2000 is not really spec'd to 2x150, so as you might imagine, the quality suffers toward the end. Regardless, it's pretty good. Looking at the mapping stats, 99.55% of the reads mapped, and overall 79.85% of the reads were error-free.

    The next two tabs contain a couple of lanes of NextSeq bacterial sequence. Lane 1 generally seems to be the best, with quality dropping to a minimum at lane 4. But even for lane 1, only 96.47% of the reads mapped and 49.3% were perfect matches; by lane 4, 95.91% mapped and 38.91% were perfect. So the rate of reads with errors roughly tripled from HS2000 (which does not support 2x150bp runs) to NextSeq (which supposedly does), and as you can see on the "Average Quality by Position" and "Error Rate vs Read Position" graphs, the comparison would be brutal - an order of magnitude or more - if you consider 2x100bp reads. Also, if you look at the "Quality Score Accuracy" graph, the HS2000 quality scores are fairly accurate and typically underestimate quality, while the NextSeq ones are inaccurate and overestimate quality by about 10 dB (and are quantized), so you can't easily quality-trim the NextSeq data to improve it.

    The "Library Uniqueness" graph, generated by sampling a kmer from each read and hashing it to see if it was seen before, is also very odd for NextSeq. It is wavy. The graph should monotonically decrease and any increase indicates a sudden error burst. So it seems maybe the period (~625000 reads) corresponds with an image frame, the clusters around the edges of the frame are blurry, as one might expect from low-quality or miscalibrated optics.

    The Base Frequency vs Position graph is also interesting - NextSeq has a clear A/T ratio bias that is not present in HS data. The 3bp-wavelength sawtooth pattern probably has something to do with codon structure.

    Does anyone else have data they'd like to share on NextSeq machines?

    P.S. Command lines I used:

    Code:
    bbcountunique.sh in=reads.fq.gz reads=100000000 out=uniqueness.txt
    
    bbduk.sh in=reads.fq.gz reads=4000000 ktrim=r k=25 hdist=1 mink=12 tbo tpe ref=nextera.fa,truseq.fa out=ktrimmed.fq.gz ow
    
    bbmap.sh in=ktrimmed.fq.gz reads=4000000 mhist=mhist.txt ihist=ihist.txt bhist=bhist.txt idhist=idhist.txt ehist=ehist.txt qhist=qhist.txt idbins=200 qahist=qahist.txt aqhist=aqhist.txt indelhist=indelhist.txt gchist=gchist.txt
    
    bbmerge.sh in=ktrimmed.fq.gz reads=4000000 ihist=ihist_merge.txt
  • nucacidhunter
    Jafar Jabbari
    • Jan 2013
    • 1250

    #2
    Thanks Brian for posting your analysis results. I wonder if HiSeq reads are also from bacterial DNA library and prepared using the same protocol as NextSeq ones.

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      #3
      The HiSeq reads are bacterial, but from a collection of 26 different isolates mixed together to form a synthetic metagenomic community. I don't know much about the preparation protocols, but certainly the insert sizes differ substantially, so at least size selection was probably different; maybe shearing too.

      Comment

      • colindaven
        Senior Member
        • Oct 2008
        • 417

        #4
        Interesting, thanks very much for the detailed analysis and your thoughts. So the data looks a little worse than HiSeq, I agree, but they're at an early stage with the NextSeq chemistry. Far more serious would be the use of low quality optics, which would be understandable at that price point.

        Any thoughts or observations on de novo assembly or SNP calling ? I believe I saw a post on SeqAnswers saying SNP calling works fine on the NextSeq at the expense of a few more indel errors (compared to HiSeq data).

        We are interested in a direct comparison against the Ion Proton. I see these details indicate the indel error rate is a lot lower here than that what I've heard comes off the Proton. This is very important for getting good de novo assemblies of course.

        Thanks again.

        Comment

        • rocksd
          Member
          • Jul 2010
          • 14

          #5
          Originally posted by Brian Bushnell View Post
          We recently acquired a NextSeq machine and are not very impressed with the data. I've uploaded a spreadsheet containing some of the statistics here:



          The first tab is a HiSeq2000 2x150bp run. The insert size was below target, so I adapter-trimmed adapters before analyzing the data (no other preprocessing was run); and the HS2000 is not really spec'd to 2x150, so as you might imagine, the quality suffers toward the end. Regardless, it's pretty good. Looking at the mapping stats, 99.55% of the reads mapped, and overall 79.85% of the reads were error-free.

          The next two tabs contain a couple of lanes of NextSeq bacterial sequence. Lane 1 generally seems to be the best, with quality dropping to a minimum at lane 4. But even for lane 1, only 96.47% of the reads mapped and 49.3% were perfect matches; by lane 4, 95.91% mapped and 38.91% were perfect. So the rate of reads with errors roughly tripled from HS2000 (which does not support 2x150bp runs) to NextSeq (which supposedly does), and as you can see on the "Average Quality by Position" and "Error Rate vs Read Position" graphs, the comparison would be brutal - an order of magnitude or more - if you consider 2x100bp reads. Also, if you look at the "Quality Score Accuracy" graph, the HS2000 quality scores are fairly accurate and typically underestimate quality, while the NextSeq ones are inaccurate and overestimate quality by about 10 dB (and are quantized), so you can't easily quality-trim the NextSeq data to improve it.

          The "Library Uniqueness" graph, generated by sampling a kmer from each read and hashing it to see if it was seen before, is also very odd for NextSeq. It is wavy. The graph should monotonically decrease and any increase indicates a sudden error burst. So it seems maybe the period (~625000 reads) corresponds with an image frame, the clusters around the edges of the frame are blurry, as one might expect from low-quality or miscalibrated optics.

          The Base Frequency vs Position graph is also interesting - NextSeq has a clear A/T ratio bias that is not present in HS data. The 3bp-wavelength sawtooth pattern probably has something to do with codon structure.

          Does anyone else have data they'd like to share on NextSeq machines?

          P.S. Command lines I used:

          Code:
          bbcountunique.sh in=reads.fq.gz reads=100000000 out=uniqueness.txt
          
          bbduk.sh in=reads.fq.gz reads=4000000 ktrim=r k=25 hdist=1 mink=12 tbo tpe ref=nextera.fa,truseq.fa out=ktrimmed.fq.gz ow
          
          bbmap.sh in=ktrimmed.fq.gz reads=4000000 mhist=mhist.txt ihist=ihist.txt bhist=bhist.txt idhist=idhist.txt ehist=ehist.txt qhist=qhist.txt idbins=200 qahist=qahist.txt aqhist=aqhist.txt indelhist=indelhist.txt gchist=gchist.txt
          
          bbmerge.sh in=ktrimmed.fq.gz reads=4000000 ihist=ihist_merge.txt
          Hi Brian,

          We are looking to purchasing a NextSeq. But we do have a concern regarding the quality of the reads generated on NextSeq. Do you have a better experience now with the NextSeq?

          Your input is highly appreciated.

          James

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            V2 chemistry has substantially higher quality than V1; it's basically fine. However, it still has some issues with the barcode-reading cycles, which has caused problems with multiplexed runs; we've had some in which certain barcodes are misread ~95% of the time, and thus get demultiplexed into the unknown bin. Last I heard, Illumina was aware of this issue and working on it; not sure what the current status is.

            Comment

            • rocksd
              Member
              • Jul 2010
              • 14

              #7
              Originally posted by Brian Bushnell View Post
              V2 chemistry has substantially higher quality than V1; it's basically fine. However, it still has some issues with the barcode-reading cycles, which has caused problems with multiplexed runs; we've had some in which certain barcodes are misread ~95% of the time, and thus get demultiplexed into the unknown bin. Last I heard, Illumina was aware of this issue and working on it; not sure what the current status is.
              Brian,

              Thanks for your reply. Are those bar-codes (that were misread) from Illumina or are they custom ones that prepared by you or your end-user?

              Thanks

              James

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                I think they were Illumina TruSeq, but it's possible they were custom. They worked fine on HiSeq and MiSeq, though, and on NextSeq with V1 chemistry.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  Yesterday, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, Yesterday, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...