Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novaseq 600 very low insert size

    Hi, I've been seeing very short insert sizes on the Novaseq 6000, using 2x150bp whole genome sequencing.
    Insert sizes inferred from mapping reads are much lower than the sizes of DNA fragments of the library, as measured by electrophoresis (average 450 bp).
    Click image for larger version  Name:	electrophoresis.jpg Views:	15 Size:	64.9 KB ID:	325632

    See the histogram below. As most insert sizes are <300 bp, mates overlap, and many do on their full length (insert size 150 bp). In that case both mates don't represent 2*150bp, but rather 1*150 bp. We might as well perform single-end sequencing.

    Click image for larger version  Name:	inserts.jpg Views:	14 Size:	27.9 KB ID:	325633

    We never had this issue with HiSeq, but we've had it with Novaseq with two different sequencing centers. The same problem affected a colleague working with a different sequencing center using Novaseq, on a different organism. I've yet to see decent insert sizes obtained with this technology, but people usually don't report on this metric and perhaps rarely measure it.

    Is there a bias favoring the sequencing of shorter fragments on the Novaseq platform ?​

    Thanks.

    Jean




    Last edited by jeanlain; 04-13-2024, 08:00 AM.

  • #2
    Which version of HiSeq did you use previously? The HiSeq 4000, NovaSeq, and NextSeq 2000 all utilize a newer clustering chemistry known as Exclusion Amplification (in most Illumina docs as ExAmp) that goes through rapid seeding on the flowcell and clusters immediately to occupy the microwells before other templates seed. MiSeqs, NextSeq 500s, and HiSeq 2000/2500s use random seeding that don't favor specific fragments sizes. This rapid seeding during ExAmp favors short fragments seeding first - if you map their positions on the flow cell, you should see that the insert length at the start of the lane is shorter than at the end of the lane.

    Comment


    • #3
      Thanks for the reply. I think we were using HiSeq 2500.

      Comment


      • #4
        This is an old article, but the author thoroughly explains some of the drawbacks of ExAmp, including the short fragment bias.

        The HiSeq 4000 was Illumina's way of making the patterned flowcell technology available to non X Ten customers, and opening up patterned ...

        Comment


        • #5
          Thanks.
          As you can see from my first post, the bias towards shorter fragments is very strong. Is it always that strong? I don't see many people complaining about it, but it's a big problem if half of your sequences are duplicated because mate overlap.

          Comment


          • #6
            I don't know that anyone has measured how extreme the bias is for library fragments. If I remember correctly Illumina published some rough numbers early on stating that adapter dimers (much shorter than a library) could take up 5-10x more of your reads on a patterned flow cell compared to what they did on a nonpatterned, but I don't know how to find where I read that initially.

            The closest I've found is point 3 on this post about HiSeq 4000 services, which says that 1% dimer can translate to 6% of reads, and 10% dimer up to 84% of reads.
            Illumina HiSeq 3000 HiSeq 4000 instrument: considerations, limitations and service prices.


            My recommendation would be to fragment gDNA less to create larger inserts, or if you're performing a double-sided cleanup at the end of library prep to generate the profile in the electropherogram you posted above, adjust your ratios to eliminate more short fragments and shift the distribution to the right. If short fragments aren't present, the bias won't allow them to be over-represented.

            Comment


            • #7
              Thanks for the recommendation. We don't prepare the libraries ourselves, we just send genomic DNA to sequencing platforms. We may ask to maximize insert size.
              An analysis of the selection bias would be helpful to publish, as the problem may be important. It can greatly impact the amount of useful sequence data you obtain. Not only the number of different bases that are sequenced can be much less than 2x150 per read pair, you would end up with nothing useful if the read pair has a mapping quality of zero because the effective sequence is too short and can map at different locations with equal score. In the end, you may lose a lot of data.

              Comment


              • #8
                Dear all, which sequencing platform do you recommend for isolated pathogenic bacteria Illumina NovaSeq 6000,Illumina NextSeq 550 platform? We intend to explore virulence genes and resistance genes and all SNP and variants? ?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Addressing Off-Target Effects in CRISPR Technologies
                  by seqadmin






                  The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                  08-27-2024, 04:44 AM
                • seqadmin
                  Selecting and Optimizing mRNA Library Preparations
                  by seqadmin



                  Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                  08-07-2024, 12:11 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 08-27-2024, 04:40 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 08-22-2024, 05:00 AM
                0 responses
                293 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 08-21-2024, 10:49 AM
                0 responses
                135 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 08-19-2024, 05:12 AM
                0 responses
                124 views
                0 likes
                Last Post seqadmin  
                Working...
                X