Seqanswers Leaderboard Ad

**greigite** · 09-27-2010, 09:53 AM

I don't have any data addressing this question but it is clearly an important one and deserves some discussion. Could you share any more info about the source of this claim? If true, where in the library prep process do you expect the greatest loss of complexity and how could it be alleviated? I'd guess that PCR amplification would be the major source. If true, do you think libraries prepped with either no-amplification protocols or minimal amplification (4 cycles) would be more complex than libraries prepped with 12 cycles? It'd be an interesting experiment to take the same genomic DNA through ligation, then amplify it with different cycle numbers and sequence those to look at shifts in library complexity.

**cnicolet** · 09-27-2010, 10:04 AM

An investigator on the UCD campus thought that's what they were doing at BGI, and he's changed over the way his lab is doing things, plus telling everyone else about it. So to my mind there is zero data, just hearsay at this point. But I'd like to know! I completely agree that the number of amplification cycles, plus the fragmentation method, is going to provide so much more variability to claim "one lane" is enough seems premature if not just wrong. Clearly each library will have a limit, but I'm wondering if it wasn't mis-heard and that it was one Hi-seq flow cell that maxed out the library, and not one lane.

**greigite** · 09-27-2010, 10:09 AM

Very interesting and I agree that there are quite a lot of variables going into how completely a library is sampled that a single rule of thumb seems improbable. I am running some libraries next month that may be informative for this question. We are prepping some genomic DNA and RNA libraries from a mixed community with 4 cycles amplification and will be running technical replicates to look at the question of sampling depth.

**kmcarr** · 09-28-2010, 10:41 AM

Originally posted by cnicolet View Post

Hi:
We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
Thanks community!

I can provide one data point.

One lane of paired end reads from a genomic DNA library prepared using the standard Illumina prep method (mean insert size = 220bp). The DNA is from a vertebrate organism with a 1.2Gbp genome. 35,255,961 paired reads were generated and aligned to the genome using bowtie (parameters: -X 280 -a --best --strata -M 1). From these 26,776,347 properly paired alignments were identified. The output was analyzed for duplicates using the Picard tools MarkDuplicates program. From the properly paired reads 156,630 duplicate fragments were identified which is a duplication rate of 0.56%. Picard also reports a number denoted as "ESTIMATED_LIBRARY_SIZE" which in this case was 2,279,812,418. The Picard documentation is pretty sparse so I don't know what this number truly means or how it is calculated.

Even though this is but one example, based on these numbers I have a very hard time believing that a single lane comes anywhere close to saturating the diversity of a standard Illumina library prep.

**malachig** · 09-28-2010, 12:09 PM

Another potentially important factor is the amount of input gDNA. In libraries with extremely low input amounts, you start to see a reduction in library complexity because you have created a molecular bottleneck. We see this in both genome and transcriptome libraries with very low input. Transcriptome libraries have additional library complexity concerns. For example, extreme end bias that results from using heavily degraded or 3' amplified RNA can lead to rapid saturation.

I agree with kmcarr, that if the library is constructed using the standard method with the recommended amount of gDNA input, one lane should not come close to saturating the diversity of a large genome such as human...

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Library complexity

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News