Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Library complexity

    Hi:
    We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
    Thanks community!

  • #2
    I don't have any data addressing this question but it is clearly an important one and deserves some discussion. Could you share any more info about the source of this claim? If true, where in the library prep process do you expect the greatest loss of complexity and how could it be alleviated? I'd guess that PCR amplification would be the major source. If true, do you think libraries prepped with either no-amplification protocols or minimal amplification (4 cycles) would be more complex than libraries prepped with 12 cycles? It'd be an interesting experiment to take the same genomic DNA through ligation, then amplify it with different cycle numbers and sequence those to look at shifts in library complexity.

    Comment


    • #3
      An investigator on the UCD campus thought that's what they were doing at BGI, and he's changed over the way his lab is doing things, plus telling everyone else about it. So to my mind there is zero data, just hearsay at this point. But I'd like to know! I completely agree that the number of amplification cycles, plus the fragmentation method, is going to provide so much more variability to claim "one lane" is enough seems premature if not just wrong. Clearly each library will have a limit, but I'm wondering if it wasn't mis-heard and that it was one Hi-seq flow cell that maxed out the library, and not one lane.

      Comment


      • #4
        Very interesting and I agree that there are quite a lot of variables going into how completely a library is sampled that a single rule of thumb seems improbable. I am running some libraries next month that may be informative for this question. We are prepping some genomic DNA and RNA libraries from a mixed community with 4 cycles amplification and will be running technical replicates to look at the question of sampling depth.

        Comment


        • #5
          Originally posted by cnicolet View Post
          Hi:
          We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
          Thanks community!
          I can provide one data point.

          One lane of paired end reads from a genomic DNA library prepared using the standard Illumina prep method (mean insert size = 220bp). The DNA is from a vertebrate organism with a 1.2Gbp genome. 35,255,961 paired reads were generated and aligned to the genome using bowtie (parameters: -X 280 -a --best --strata -M 1). From these 26,776,347 properly paired alignments were identified. The output was analyzed for duplicates using the Picard tools MarkDuplicates program. From the properly paired reads 156,630 duplicate fragments were identified which is a duplication rate of 0.56%. Picard also reports a number denoted as "ESTIMATED_LIBRARY_SIZE" which in this case was 2,279,812,418. The Picard documentation is pretty sparse so I don't know what this number truly means or how it is calculated.

          Even though this is but one example, based on these numbers I have a very hard time believing that a single lane comes anywhere close to saturating the diversity of a standard Illumina library prep.

          Comment


          • #6
            Another potentially important factor is the amount of input gDNA. In libraries with extremely low input amounts, you start to see a reduction in library complexity because you have created a molecular bottleneck. We see this in both genome and transcriptome libraries with very low input. Transcriptome libraries have additional library complexity concerns. For example, extreme end bias that results from using heavily degraded or 3' amplified RNA can lead to rapid saturation.

            I agree with kmcarr, that if the library is constructed using the standard method with the recommended amount of gDNA input, one lane should not come close to saturating the diversity of a large genome such as human...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin







              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has...
              12-02-2024, 01:49 PM
            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-02-2024, 09:29 AM
            0 responses
            149 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-02-2024, 09:06 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-02-2024, 08:03 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-22-2024, 07:36 AM
            0 responses
            73 views
            0 likes
            Last Post seqadmin  
            Working...
            X