Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cnicolet
    Member
    • Dec 2008
    • 35

    Library complexity

    Hi:
    We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
    Thanks community!
  • greigite
    Senior Member
    • Mar 2009
    • 145

    #2
    I don't have any data addressing this question but it is clearly an important one and deserves some discussion. Could you share any more info about the source of this claim? If true, where in the library prep process do you expect the greatest loss of complexity and how could it be alleviated? I'd guess that PCR amplification would be the major source. If true, do you think libraries prepped with either no-amplification protocols or minimal amplification (4 cycles) would be more complex than libraries prepped with 12 cycles? It'd be an interesting experiment to take the same genomic DNA through ligation, then amplify it with different cycle numbers and sequence those to look at shifts in library complexity.

    Comment

    • cnicolet
      Member
      • Dec 2008
      • 35

      #3
      An investigator on the UCD campus thought that's what they were doing at BGI, and he's changed over the way his lab is doing things, plus telling everyone else about it. So to my mind there is zero data, just hearsay at this point. But I'd like to know! I completely agree that the number of amplification cycles, plus the fragmentation method, is going to provide so much more variability to claim "one lane" is enough seems premature if not just wrong. Clearly each library will have a limit, but I'm wondering if it wasn't mis-heard and that it was one Hi-seq flow cell that maxed out the library, and not one lane.

      Comment

      • greigite
        Senior Member
        • Mar 2009
        • 145

        #4
        Very interesting and I agree that there are quite a lot of variables going into how completely a library is sampled that a single rule of thumb seems improbable. I am running some libraries next month that may be informative for this question. We are prepping some genomic DNA and RNA libraries from a mixed community with 4 cycles amplification and will be running technical replicates to look at the question of sampling depth.

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          Originally posted by cnicolet View Post
          Hi:
          We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
          Thanks community!
          I can provide one data point.

          One lane of paired end reads from a genomic DNA library prepared using the standard Illumina prep method (mean insert size = 220bp). The DNA is from a vertebrate organism with a 1.2Gbp genome. 35,255,961 paired reads were generated and aligned to the genome using bowtie (parameters: -X 280 -a --best --strata -M 1). From these 26,776,347 properly paired alignments were identified. The output was analyzed for duplicates using the Picard tools MarkDuplicates program. From the properly paired reads 156,630 duplicate fragments were identified which is a duplication rate of 0.56%. Picard also reports a number denoted as "ESTIMATED_LIBRARY_SIZE" which in this case was 2,279,812,418. The Picard documentation is pretty sparse so I don't know what this number truly means or how it is calculated.

          Even though this is but one example, based on these numbers I have a very hard time believing that a single lane comes anywhere close to saturating the diversity of a standard Illumina library prep.

          Comment

          • malachig
            Senior Member
            • Aug 2010
            • 117

            #6
            Another potentially important factor is the amount of input gDNA. In libraries with extremely low input amounts, you start to see a reduction in library complexity because you have created a molecular bottleneck. We see this in both genome and transcriptome libraries with very low input. Transcriptome libraries have additional library complexity concerns. For example, extreme end bias that results from using heavily degraded or 3' amplified RNA can lead to rapid saturation.

            I agree with kmcarr, that if the library is constructed using the standard method with the recommended amount of gDNA input, one lane should not come close to saturating the diversity of a large genome such as human...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            57 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            200 views
            0 reactions
            Last Post seqadmin  
            Working...