Seqanswers Leaderboard Ad

**Joann** · 10-27-2009, 07:35 AM

genome?

If that's the methodology, my question is

So.....what genome ends up getting sequenced?

**pmiguel** · 10-27-2009, 08:00 AM

Originally posted by Joann View Post

If that's the methodology, my question is

So.....what genome ends up getting sequenced?

Not sure I follow you.

**krobison** · 10-27-2009, 08:33 PM

Check out these papers: each would suggest that the "I need enough DNA to visualize" angle is why so much input DNA; you apparently can get by with very little DNA if you have a better way to track it & quantitate it.

Anyone here routinely using these protocols or similar ones? How do they really behave?

Digital PCR provides sensitive and absolute calibration for high throughput sequencing - BMC Genomics

http://www.biomedcentral.com/1471-2164/10/116

Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

BMC Genomics. 2009 Mar 19;10:116.
Digital PCR provides sensitive and absolute calibration for high throughput sequencing.
White RA 3rd, Blainey PC, Fan HC, Quake SR.

Department of Bioengineering at Stanford University and Howard Hughes Medical Institute, Stanford, CA 94305, USA. [email protected]
BACKGROUND: Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. RESULTS: We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. CONCLUSION: The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

PMID: 19298667 [PubMed - indexed for MEDLINE]

http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18084031

Nucleic Acids Res. 2008 Jan;36(1):e5. Epub 2007 Dec 15.
From micrograms to picograms: quantitative PCR reduces the material demands of high-throughput sequencing.
Meyer M, Briggs AW, Maricic T, Höber B, Höffner B, Krause J, Weihmann A, Pääbo S, Hofreiter M.

Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany. [email protected]
Current efforts to recover the Neandertal and mammoth genomes by 454 DNA sequencing demonstrate the sensitivity of this technology. However, routine 454 sequencing applications still require microgram quantities of initial material. This is due to a lack of effective methods for quantifying 454 sequencing libraries, necessitating expensive and labour-intensive procedures when sequencing ancient DNA and other poor DNA samples. Here we report a 454 sequencing library quantification method based on quantitative PCR that effectively eliminates these limitations. We estimated both the molecule numbers and the fragment size distributions in sequencing libraries derived from Neandertal DNA extracts, SAGE ditags and bonobo genomic DNA, obtaining optimal sequencing yields without performing any titration runs. Using this method, 454 sequencing can routinely be performed from as little as 50 pg of initial material without titration runs, thereby drastically reducing costs while increasing the scope of sample throughput and protocol development on the 454 platform. The method should also apply to Illumina/Solexa and ABI/SOLiD sequencing, and should therefore help to widen the accessibility of all three platforms.

**pmiguel** · 10-28-2009, 04:19 AM

Yes, I de-emphasized this point in my original post, because it has received some attention and methods have been developed to address part of this particular issue. (Doing QC on the size distribution of a library you cannot see on a gel or lab chip would still be tricky.)

Note, however that both papers appear to suffer from the same dismal molecular yields of "input DNA" to "library molecules".

In the Meyer's paper, the Bonobo sample (table 2) starts with 500 ng and a mean fragment size of 500 bases. Using the "1 ug of 1kb DNA is about 1 trillion molecules" rule of thumb I suggested earlier -- that equals 1 trillion 500 base molecules (double stranded). Meyers succeeds in isolating 50,000 beads after enrichment from 1 trillion molecules he started with.

Molecular yield: 5E+04/1E+12 = 5E-08
that is, 0.000005%

That yield is a 3x overestimate if you only count sequence-pass reads generated.

Similarly, if you look at "additional file 2" in the White paper, the lowest input DNA amount used in a shotgun library is 0.7 ug of 550 bp mean size. Again over 1 trillion molecules to start with. This yielded 7E+05 to 1E+06 ssDNA library molecules (depending on quantitation method). That is an excellent molecular yield: 0.0001%. I still would like to know where most of the 99.9999% of the molecules went, though..

But both papers show that trillions of library molecules are not necessary to get a good emulsion PCR. That is pretty well accepted these days.

White et al. http://www.biomedcentral.com/1471-2164/10/116 do at least point in the direction of the 500 pound gorilla:

It is natural to expect that library preparation protocols developed with the capacity to handle up to five micrograms of input are far from optimal with respect to minimizing loss from nanogram or picogram samples. A procedure optimized for trace samples with reduced reaction volumes and media quantities, possibly formatted in a microfluidic chip, has the potential to dramatically improve the recovery of library molecules, allowing preparation of sequencing libraries from quantities of sample comparable to that actually required for the sequencing run, e.g. close to or less than one picogram.

--
Phillip

**What_Da_Seq** · 10-28-2009, 12:06 PM

How many single/di/trinucleotides are being generated by the fragmentation process - invisible to visualization and percent DNA lost in the fragment sizing step. Philip I get your line of inquiry and I am wondering if single molecule sequencing improves on the abysmal efficiency.
My 1.25 cent

**pmiguel** · 10-28-2009, 12:42 PM

Originally posted by What_Da_Seq View Post

How many single/di/trinucleotides are being generated by the fragmentation process - invisible to visualization and percent DNA lost in the fragment sizing step. Philip I get your line of inquiry and I am wondering if single molecule sequencing improves on the abysmal efficiency.
My 1.25 cent

Depends on the method. Nebulization/Hydroshear probably produces very few oligomers. But size selection will drastically reduce the amount of DNA. Still this usually is not much more than 90% loss.

My guess is that the majority of the subsequent loss is a result of (1)Unrepairable ends and (2)DNA damage that prevents DNA replication.

--
Phillip

**McTomo** · 10-30-2009, 01:42 AM

In the figure 2 of this paper you can see where does you DNA go and the efficiency of each step in the 454 library preparation process:

Page not found - BioTechniques

http://www.biotechniques.com/BiotechniquesJournal/2009/January/Optimization-of-454-sequencing-library-preparation-from-small-amounts-of-DNA-permits-sequence-determination-of-both-DNA-strands/biotechniques-92046.html

There seem to be the biggest loss at the NaOH melting step

**pmiguel** · 10-31-2009, 11:02 AM

Originally posted by McTomo View Post

In the figure 2 of this paper you can see where does you DNA go and the efficiency of each step in the 454 library preparation process:

Page not found - BioTechniques

http://www.biotechniques.com/BiotechniquesJournal/2009/January/Optimization-of-454-sequencing-library-preparation-from-small-amounts-of-DNA-permits-sequence-determination-of-both-DNA-strands/biotechniques-92046.html

There seem to be the biggest loss at the NaOH melting step

Thanks, that is very interesting. I knew about the highly variable and generally extremely low yields from the library immobilization/ssDNA elution. Bruce Roe's lab, for example, discards that step altogether. But the Maricic and Paabo method does make it seem much more attractive.

A couple of notes. This paper only deals with post adaptor ligation DNA loss, because it starts with a PCR product/library molecule. Also, even the 99% potential loss of this step only explains 2 of the >6 orders of magnitude of DNA loss in the Roche protocol.

As I've mentioned before I think most of the rest is probably the result of un-repairable ends and DNA damage that has rendered a given strand un-replicatable.

--
Phillip

**McTomo** · 11-02-2009, 03:51 AM

Originally posted by pmiguel View Post

A couple of notes. This paper only deals with post adaptor ligation DNA loss, because it starts with a PCR product/library molecule. Also, even the 99% potential loss of this step only explains 2 of the >6 orders of magnitude of DNA loss in the Roche protocol.

If you multiply the losses in each step of the library preparation process (starting at the blunting), you come to the ~10% of the starting DNA ending up in the 454 library. Even though the PCR product was used, it has to be repaired: overhanging A's have to be removed and the phosphates have to be added. However, I agree that there might to be other types of damage that appear in the sheared genomic DNA that can't be repaired.

**pmiguel** · 11-02-2009, 04:35 AM

Originally posted by McTomo View Post

If you multiply the losses in each step of the library preparation process (starting at the blunting), you come to the ~10% of the starting DNA ending up in the 454 library. Even though the PCR product was used, it has to be repaired: overhanging A's have to be removed and the phosphates have to be added. However, I agree that there might to be other types of damage that appear in the sheared genomic DNA that can't be repaired.

Yes, my guess is that non-enzymatic fragmentation methods produce some ends that cannot be repaired by the typical T4-polymerase/T4-PNK. I posted my speculation on this topic, based largely on a very old paper:

http://seqanswers.com/forums/showthread.php?t=2759

The upshot was that sonication predominantly broke C-O bonds. While these C-O breaks may proceed through solvolysis to C-OH ends, other outcomes are conceivable. Unclear what ends nebulization/hydroshearing produce.

While an unrepairable end, on either end, of a DNA fragment will prevent creation of a library amplicon from that fragment, there are other issues to consider. DNA damage may prevent replication of a DNA strand. How damaged is the typical DNA prep? I'm sure this has been considered in the literature. But a PCR reaction, lacking the support of a cellular environment, would be much more susceptible to chain-terminating DNA damage than an in vivo assay would detect.

I think this is why the SOLiD protocols invariably utilize a pre-ePCR, PCR step. That way amplifiable library molecules will predominate in a sample and assays of that pre-amplified sample will more accurately predict that sample's behavior in ePCR.

--
Phillip

**happy** · 11-03-2009, 04:55 PM

How many genomes (human haploid) are in a ug of DNA?

**pmiguel** · 11-04-2009, 04:38 AM

Originally posted by happy View Post

How many genomes (human haploid) are in a ug of DNA?

How about the chicken haploid genome instead? That is 1 billion bp.

If 1 ug of 1 thousand bp fragments is 1 trillion molecules, that is the same as saying that a quadrillion bp genome (1 thousand x 1 trillion = 1E+03 x 1E+12 = 1E+15 = 1 quadrillion) is 1 ug.

So:

Code:

genome             genome
size (bp) 	   mass
-------------------------------
1 quadrillion 	   1 ug
1 trillion 	   1 ng
1 billion 	   1 pg
1 million          1 fg

So a haploid chicken genome is 1 pg. 1 million haploid chicken genomes are in a ug of chicken DNA.

That means a haploid human genome is 3 pg. So 1 ug of human DNA is roughly 333,333 human genomes.

--
Phillip

**Nitrogen-DNE-sulfer** · 11-17-2009, 06:50 PM

Great Thread,

Stepping back a bit to dissect this we have been testing how far we can go with simpler Fragment libraries as a first measure of this. Most Circularized protocols have many inefficient steps and we began by quantitating how much DNA we can get from just fragmenting DNA and adapting it and then counting distinct molecules on the back end we have gone as low as 750pg of already sheared DNA to generate 30-40M distinct 50mer human reads. I think this is a very key point. This is roughly 300 copies of the genome but most importantly, we didnt covaris this DNA. It came from Maternal blood streams so it was enzymaticaly digested in situ or in vivo. It also has a very different GC content than Covaris DNA not surprisingly.
The reason I find this is intriguing is that all methods eventually go through a final Frag adaptor ligation so its important to know the efficiency of this step and its after all the simplest to measure. We will be backing up into the various circularization protocols shortly but already know the SOLiD circles are 10-20% efficient at the lengths mentioned above.

In terms of Covaris'd DNA, I will look through our data but we have performed 600M read on 1 ug buccal DNA Covaris'd from a patient and not saturated this library. We probably need to go deeper to understand if the different shearing methods are playing a damaging effect.

I found the complete genomic paper fairly well written in regards to exact pmols at each step. Lots of amplification along the way but its clear we need protocols which speak to these quants at every step with the other platforms as well.

The final point I'd add to the discussion is that not all quantified DNA is amplifiable or makes it to a bead or to a cluster. We're working with emPCR on SOLiD and we assume a 1/2 to 2/3rds of our reactors have DNA and no beads. We lean on the pushing the bead poisson high and the template poisson low as 2 beads in a reactor dont kill us but 2 templates do.

Similar effects may exist on the poisson curves for clusters...ie Flow cells must be flooded with 1 concentration where only a portion of this concentration can seed the flow cell surface but molecules exist throughout the whole volume and I'm still unclear if both surfaces amplify and only one being imaged creates another factor of 2 loss?

**pmiguel** · 11-18-2009, 05:42 AM