Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novaseq 600 very low insert size

    Hi, I've been seeing very short insert sizes on the Novaseq 6000, using 2x150bp whole genome sequencing.
    Insert sizes inferred from mapping reads are much lower than the sizes of DNA fragments of the library, as measured by electrophoresis (average 450 bp).
    Click image for larger version  Name:	electrophoresis.jpg Views:	15 Size:	64.9 KB ID:	325632

    See the histogram below. As most insert sizes are <300 bp, mates overlap, and many do on their full length (insert size 150 bp). In that case both mates don't represent 2*150bp, but rather 1*150 bp. We might as well perform single-end sequencing.

    Click image for larger version  Name:	inserts.jpg Views:	14 Size:	27.9 KB ID:	325633

    We never had this issue with HiSeq, but we've had it with Novaseq with two different sequencing centers. The same problem affected a colleague working with a different sequencing center using Novaseq, on a different organism. I've yet to see decent insert sizes obtained with this technology, but people usually don't report on this metric and perhaps rarely measure it.

    Is there a bias favoring the sequencing of shorter fragments on the Novaseq platform ?​

    Thanks.

    Jean




    Last edited by jeanlain; 04-13-2024, 08:00 AM.

  • #2
    Which version of HiSeq did you use previously? The HiSeq 4000, NovaSeq, and NextSeq 2000 all utilize a newer clustering chemistry known as Exclusion Amplification (in most Illumina docs as ExAmp) that goes through rapid seeding on the flowcell and clusters immediately to occupy the microwells before other templates seed. MiSeqs, NextSeq 500s, and HiSeq 2000/2500s use random seeding that don't favor specific fragments sizes. This rapid seeding during ExAmp favors short fragments seeding first - if you map their positions on the flow cell, you should see that the insert length at the start of the lane is shorter than at the end of the lane.

    Comment


    • #3
      Thanks for the reply. I think we were using HiSeq 2500.

      Comment


      • #4
        This is an old article, but the author thoroughly explains some of the drawbacks of ExAmp, including the short fragment bias.

        The HiSeq 4000 was Illumina's way of making the patterned flowcell technology available to non X Ten customers, and opening up patterned ...

        Comment


        • #5
          Thanks.
          As you can see from my first post, the bias towards shorter fragments is very strong. Is it always that strong? I don't see many people complaining about it, but it's a big problem if half of your sequences are duplicated because mate overlap.

          Comment


          • #6
            I don't know that anyone has measured how extreme the bias is for library fragments. If I remember correctly Illumina published some rough numbers early on stating that adapter dimers (much shorter than a library) could take up 5-10x more of your reads on a patterned flow cell compared to what they did on a nonpatterned, but I don't know how to find where I read that initially.

            The closest I've found is point 3 on this post about HiSeq 4000 services, which says that 1% dimer can translate to 6% of reads, and 10% dimer up to 84% of reads.
            Illumina HiSeq 3000 HiSeq 4000 instrument: considerations, limitations and service prices.


            My recommendation would be to fragment gDNA less to create larger inserts, or if you're performing a double-sided cleanup at the end of library prep to generate the profile in the electropherogram you posted above, adjust your ratios to eliminate more short fragments and shift the distribution to the right. If short fragments aren't present, the bias won't allow them to be over-represented.

            Comment


            • #7
              Thanks for the recommendation. We don't prepare the libraries ourselves, we just send genomic DNA to sequencing platforms. We may ask to maximize insert size.
              An analysis of the selection bias would be helpful to publish, as the problem may be important. It can greatly impact the amount of useful sequence data you obtain. Not only the number of different bases that are sequenced can be much less than 2x150 per read pair, you would end up with nothing useful if the read pair has a mapping quality of zero because the effective sequence is too short and can map at different locations with equal score. In the end, you may lose a lot of data.

              Comment


              • #8
                Dear all, which sequencing platform do you recommend for isolated pathogenic bacteria Illumina NovaSeq 6000,Illumina NextSeq 550 platform? We intend to explore virulence genes and resistance genes and all SNP and variants? ?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 10:49 AM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Working...
                X