The standard Roche protocol for shotgun library construction asks for 10 ug of input DNA to yield a few million templated beads for sequencing. Rule of thumb: 1 ug of 1 kb double stranded DNA is 1 trillion (1E+12) molecules[1].
Get that? 10 trillion molecules to start with so that I can sequence less than 10 million of them. What happened to the other 9,999,990,000,000 molecules?
Not really fair to the Roche protocol? Usually one ends up with enough library to sequence more than 10 million bead? Plug your own numbers in. My guess is the molecular yield from this technique will be no better than 0.1%.
I do not mean to single out Roche here, I think protocols for all instrument systems are looking at fractions of a percent molecular yield. As long as one has plenty of DNA, maybe it does not matter. But sometimes DNA (or RNA) is limiting, no?
And what if there is bias in the loss process? Most of us sweat adding a few more cycles of PCR into our library prep procedure because we know PCR can bias our results. But I have never met a single person who worried that the 99.9% (add as many nines as you care to) of DNA molecules being lost during library construction might have a sequence-composition biased component to their loss.
If I get any response (other than a blank stare) from those designing these protocols about the molecular yield, it usually that the yields in each step are not 100%. The implication, I presume, is that these yield losses are multiplicative. Fair enough, how many steps with 50% yield do I need to lose 99.9% of my DNA? That would be ten steps.
I do not think most library construction steps have yields as low as 50%. Instead, I think it more likely that:
(A) A few steps have extremely low molecular yields and
(B) The protocols we are using rely on our being able to visualize the molecules and their size distribution for purposes of quality control.
I am going to ignore (B) for the purposes of the rest of this post.
As for (A), most of the methodologies I see being developed for low amounts of starting material are focused on amplification. Might be worth taking a look at where DNA (or RNA) is being lost and tightening that up. A couple of places to look would be % of ends successfully repaired after mechanical fragmentation of DNA and chemical DNA damage. The latter may be a non-issue or not. But think about it, how often do you worry about the redox-state of your DNA? How about UV damage from the sunlight streaming in through your lab windows?
Might 90% of the molecules in a typical DNA prep be impossible to replicate without repair beyond the end repair we normally deploy? Could that number be 99% or 99.9%? Real question. I would like to know.
--
Phillip
(Notes)
1. Okay, yeah, using some standard numbers, like 650 MW for a base pair, the number is really 926 billion molecules, not 1 trillion. But nothing I discuss here would be sensitive to less than 10% tolerances, so the difference is safe to ignore...
Get that? 10 trillion molecules to start with so that I can sequence less than 10 million of them. What happened to the other 9,999,990,000,000 molecules?
Not really fair to the Roche protocol? Usually one ends up with enough library to sequence more than 10 million bead? Plug your own numbers in. My guess is the molecular yield from this technique will be no better than 0.1%.
I do not mean to single out Roche here, I think protocols for all instrument systems are looking at fractions of a percent molecular yield. As long as one has plenty of DNA, maybe it does not matter. But sometimes DNA (or RNA) is limiting, no?
And what if there is bias in the loss process? Most of us sweat adding a few more cycles of PCR into our library prep procedure because we know PCR can bias our results. But I have never met a single person who worried that the 99.9% (add as many nines as you care to) of DNA molecules being lost during library construction might have a sequence-composition biased component to their loss.
If I get any response (other than a blank stare) from those designing these protocols about the molecular yield, it usually that the yields in each step are not 100%. The implication, I presume, is that these yield losses are multiplicative. Fair enough, how many steps with 50% yield do I need to lose 99.9% of my DNA? That would be ten steps.
I do not think most library construction steps have yields as low as 50%. Instead, I think it more likely that:
(A) A few steps have extremely low molecular yields and
(B) The protocols we are using rely on our being able to visualize the molecules and their size distribution for purposes of quality control.
I am going to ignore (B) for the purposes of the rest of this post.
As for (A), most of the methodologies I see being developed for low amounts of starting material are focused on amplification. Might be worth taking a look at where DNA (or RNA) is being lost and tightening that up. A couple of places to look would be % of ends successfully repaired after mechanical fragmentation of DNA and chemical DNA damage. The latter may be a non-issue or not. But think about it, how often do you worry about the redox-state of your DNA? How about UV damage from the sunlight streaming in through your lab windows?
Might 90% of the molecules in a typical DNA prep be impossible to replicate without repair beyond the end repair we normally deploy? Could that number be 99% or 99.9%? Real question. I would like to know.
--
Phillip
(Notes)
1. Okay, yeah, using some standard numbers, like 650 MW for a base pair, the number is really 926 billion molecules, not 1 trillion. But nothing I discuss here would be sensitive to less than 10% tolerances, so the difference is safe to ignore...
Comment