Okay, I accept that as actual evidence. I still don't buy it for explaining the whole phenomenon. But at this point that is little more than hand waving. I'm unlikely to pony up the time and resources to do further testing, so this is as far as it goes, probably...
--
Phillip
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
The question is why short amplicons sequence better than large ones. The evidence is that when a library with broad size distribution is sequenced, after mapping reads, one finds that average size or peak of mapped fragments is smaller than input library indicating preferential sequencing of smaller library fragments. This was not important in earlier days when the libraries were size-selected in a narrow range and multiplexing was not very wide spread. But since introduction of gel free library prep kits (bead based size-selection resulting in libraries with wide distribution of fragment sizes), wide spread use of transposon mediated broad library preps and increased output of platforms it has become more important. When pooling libraries with different insert sizes for sequencing this should be taken into account to obtain desired proportionate number of reads from each library.
My answer as suggested in this thread is “RTA-mediated hypothesis”. Short fragments are more efficient in forming clusters because during bridge amplification it is more likely for polymerase to synthesis a full complementary strand (end to end) for short fragments than large ones due to limited extension time (15 sec in MiSeq). During template generation (early 4-5 cycles) RTA uses signal intensities from images and calls bases from normalised (taking colour cross-talk and phasing correction into account) intensities. Raw data are filtered to remove reads that do not meet signal purity threshold, overlapping and low intensity clusters. At this step in a population of small and large fragment clusters, small ones would have higher intensity (it is proportional to strand number resulting from amplification efficiency) and therefore are preferentially detected and their base composition is called. But large fragments because of less efficient amplification will have less intensity and would not be favoured by RTA. Of course, in a flow cell lane with larger fragments most of the clusters would have less intensity if compared to a lane with predominantly small fragments. But because RTA detection of clusters is relative (normalised intensity not raw), they still are detected and bases are called.
The argument against this is evidence from a large library sequencing (1-5 Kb) in which qPCR predicted cluster density was achieved. I have two arguments against this. Firstly, cluster density and library input are not linear. For example, if 12 pM input gives 800K cluster /mm of a flow cell lane, 8 pM input will not result in 600k cluster. Secondly, quantifying large fragments with KAPA qPCR is not accurate because the standards are 400 bp and their amplification efficiency would be more than large fragments in 1-5 kb range as in this case. In addition, if extension time is not increased significantly, large fragments will drop and only a small portion of library will be amplified and quantified resulting in underestimation of quantity. The attachment in this post is ScreenTape profiles from input library and output from the qPCR reaction showing preferential amplification of smaller fragments during PCR. The qPCR reactions were purified using 1.8x AMPure beads to remove salts, polymerase, SYBR and nucleotides.
But SAV doesn't actually depict a cluster any differently that has been recognized by RTA from one that hasn't. Nor does SAV offer any way to verify the hypothesis that short amplicons produce brighter or more robust clusters than long amplicons.Attached FilesLast edited by nucacidhunter; 06-06-2014, 03:15 AM.
Leave a comment:
-
Originally posted by nucacidhunter View PostThe result will be underestimation of library concentration because only a portion of it is quantified. Other issue would be differences in amplification efficiency of standards (~400 bp) with library. I have verified this by running QPCR product on DNA Chip and comparing its size to input amplicon size in large insert libraries. I have found that the amplicon peak of QPCR is substantially lower than actual input DNA.
Also, wasn't it necessary to remove the SYBR green, etc. from the qPCR reaction prior to running the chip? What method did you use?
--
Phillip
Leave a comment:
-
Originally posted by nucacidhunter View Post
I have seen this many times and have heard from others about it as well. But I never knew that Illumina explains the observation similar to what I independently came up with.
--
Phillip
Leave a comment:
-
But I need some mechanism that allows qPCR to accurately quantitate cluster density for a pool of long amplicons -- that is what I see.
But I see no reason at all to favor the RTA-mediated explanation for which, other than unsubstantiated claims from Illumina, there is no evidence for.
Leave a comment:
-
Originally posted by nucacidhunter View PostI have not asked Illumina FAS about this and it is my own observation and explanation. Every one is entitled to their opinion and as I said above I respect that.
Attached document shows one way to see what I have mentioned. It does not tell which cluster has been picked up by RTA, but as mentioned, it shows clusters that RTA has not picked up.
Here, as a sign of my respect, is a more fleshed out explanation of why I think what I will call the "RTA-mediated" hypothesis of why shorter amplicons of a given library predominate in Illlumina data sets is not sufficient to explain the actual phenomenon.
I have a particular set of data that makes me think this RTA-mediated hypothesis is not sufficient to explain what is going on. Here is a link to the full thread. But to summarize, we made a "large insert" TruSeq DNA library but used extra/more stringent Ampures to remove shorter fragments. Did 4 cycles of PCR on it (instead of the protocol's recommended 10) and clustered at 4pM (rather than what was normal on the MiSeq at the time -- 8pM).
Here is an Agilent chip of the library we clustered:
Again, we nailed the cluster density using our normal KAPA qPCR calculation using, if I recall correctly, the modal peak size depicted above (1892bp) in the calculation specified in the KAPA kit manual. That would include 120bp of adapters, so think of the inserts as being a modal size of 1772bp, or a little less due some distortion due to DNA mass being assayed by the agilent chip rather than DNA count.
However the result of the run when mapped back to a reference genome with BWA produced pair-end insert length as depicted here:
Okay, one might argue that the lower graph is on a linear scale and represents counts of DNA molecules whereas the top graph is mass based and displayed in the more-or-less log-linear scale that one typically sees from electrophoresis. Again, in the previous thread, I exported the data from the Agilent chip and transformed it so it would be on the same scale as the lower chart so they could be directly compared:
So, it still comports fairly well with the early statement "modal size of 1772bp", or a little less. Certainly no lower than 1600bp.
Keeping that in mind, the RTA-mediated hypothesis fails to explain our hitting cluster density exactly while shifting the size distribution of what was sequenced lower by about 500 bp. If fails because were that the case, the loss of clusters from 1.1 to 1.6 kb and above should have decreased the total number of read pairs. That is, these longer amplicon clusters should have been there physically, but just not detected by RTA. So our effective (RTA-calculated) cluster density should have been much lower than what we calculated using qPCR. But it wasn't.
I don't actually think that the short amplicons are displacing the longer ones from the flowcell during clusters. I think something else, something unknown, is going on. Okay, that is supposition also. But I need some mechanism that allows qPCR to accurately quantitate cluster density for a pool of long amplicons -- that is what I see.
As with all physical phenomena there are plenty of explanations that might explain what I describe above. But I see no reason at all to favor the RTA-mediated explanation for which, other than unsubstantiated claims from Illumina, there is no evidence for.
See what I am saying here? The RTA-mediated explanation is just a story. May have been invented whole-cloth by someone at Illumina and came to be propagated as dogma without any particular evidence. Stuff like that happens all the time. Just because it is superficially reasonable, doesn't mean it is true.
--
Phillip
Leave a comment:
-
And, yes, I have had Illumina reps tell me the same story. But it does not fit with what I have seen, so I think they are wrong.
What is your basis for writing this? I can look at a thumbnail photo, but how do I tell which clusters "RTA has not picked up for various reasons"?Attached Files
Leave a comment:
-
Leave a comment:
-
Originally posted by nucacidhunter View PostI have tried logically to explain my observation based on science behind Illumina sequencing system and I do not have any scientific evidence that small and large fragments are involved in flow cell battles for getting sequenced. All sizes of fragments can attach to flow cell lawn and during clustering small fragments will amplify and form denser clusters than large fragments. During template generation RTA passes those pure clusters (resulting from single template strand) with higher intensity which mostly would be smaller fragments clusters. RTA normalises intensities based on all clusters which is relative and in library with large fragments it will pass the ones that have higher intensity too. Looking at thumbnail photos also shows a lot of background clusters which RTA has not picked up for various reasons. I do respect everyone opinion and I am very interested to see scientific explanations.
You write:
Looking at thumbnail photos also shows a lot of background clusters which RTA has not picked up for various reasons.
--
Phillip
Leave a comment:
-
Nope, as I write above, this does not fit with the actual results we got.Last edited by nucacidhunter; 06-02-2014, 06:25 AM.
Leave a comment:
-
Originally posted by nucacidhunter View PostI wonder if it was amplicon library comprising same or similarly large sized fragments or was it a library with wide distribution of fragment sizes. I have sequenced such large sized fragment (amplicons) as well (in MiSeq) and I load 1.5x more than usual libraries to compensate for failed clusters. If all fragments are large, partially amplified clusters will be picked up by RTA and pass filter because RTA compensate for low signal intensity when most of clusters have low intensities. However, in library with wide size distribution, clusters from small fragments will have higher intensities and as a result low intensity clusters from large fragments will not pass. In widely distributed Nextera libraries with average 800 bp size I get fragments with 950 bp but they are small portion of sequences. All this indicates partial failure of large fragments’ clusters.
Not sure what is happening with your actual amplicon libraries. But in our case, we just used the concentration based on qPCR and nailed the cluster density.
Originally posted by nucacidhunter View PostSo, maybe it is not that small fragments are competing and displacing large ones (physically less feasible) but it is RTA operation that favours the high intensity small fragments clusters. I wonder if someone changes the recipes to allow for longer extension times during cluster generation if more large fragments will be sequenced.
--
Phillip
Leave a comment:
-
We have run libraries with with insert sizes averaging as high as 1.1kb.
So, whatever the explanation, it needs to account for this. Which leads one to think there must be some sort of competition among amplicons. Something that would allow the shorter amplicons to displace the longer ones and prevent them from creating clusters.
Leave a comment:
-
Originally posted by nucacidhunter View PostI think preferential clustering of smaller fragments (amplicons) can be explained by bridge amplification process. Library templates are denatured and mixed with hybridisation buffer. At this stage we expect that denatured fragments will stay single stranded and stretched free of secondary structure. If there is hybridization of strands back to their complementary strands at this step, it thermodynamically will be in favour of smaller fragments, hence reducing their number for next step not the long fragments. At next step, denatured fragments are pumped through flow cell lane and every fragments will have relatively equal chance of hybridization to oligo lawn on flow cell surface because they (should) have the complementary adapter sequences. After hybridisation, bridge is formed and amplification mix is pumped through to synthesize complementary strand to bridged fragments. Extension time is quite limited (15 sec in MiSeq) and at this step large fragments are less likely to have end to end synthesis of complementary strands because of pause or dropping out of polymerase. The result would be that those fragments will not be amplified in the next round of amplification (I think there is 30 cycle or so ) and will form weak clusters (if any) with low strand numbers depending on in which cycle this happens. Contrary, small fragments will have high chance of complementary strand synthesis end-to-end and therefore will dominate the properly formed clusters which will produce strong signals for RTA to detect and pass them.
So, whatever the explanation, it needs to account for this. Which leads one to think there must be some sort of competition among amplicons. Something that would allow the shorter amplicons to displace the longer ones and prevent them from creating clusters. That way, if the shorter amplicons are removed, the longer amplicons can form good clusters. However I can't think of a reasonable mechanism of competition. So maybe something else is going on?
--
Phillip
Leave a comment:
-
For reasons unclear to me, short amplicons seem to cluster vastly better than longer amplicons.Last edited by nucacidhunter; 05-29-2014, 05:18 PM.
Leave a comment:
-
Originally posted by dtm2451 View PostHello,
I am working on a deep sequencing protocol for a PCR amplicon (~650bp) using the TruSeq DNA PCR-Free Sample Preparation Kit and I am seeing extra peaks in my final bioanalyzer traces that concern me because I don't know what they might come from.
Peaks on the trace:
+Small peak at size of original insert
+Medium-sized peak that I think might correspond to insert+1adapter
+Large peak that I think might correspond to insert+2 adapters
-mini peak to the right of the "insert" peak
-mini peak to the right of the "insert+1adapter" peak
-mini peak to the right of the "insert+2adapters" peak
Does anyone know what the last three peaks might be?
I am attaching both the TapeStation on the initial PCR material and the bioanalyzer on the final library.
Thanks!
Dan
Illumina adapters are about each 60 bases long.
But you may be right. Illumina kit adapters are "Y"-adapters with only about 10 bp of doublestranded DNA and the rest (~50 bases) as double single-stranded tails. Single stranded molecules tend to migrate slower on Agilent chips than corresponding length double stranded molecules. So those Y-adapters may be introducing some drag.
But that would be a lot of drag from a few hundred bases of double ssDNA. Seems unlikely to me.
Another hypothesis would be that the "1539 bp" fragment is migrating slowly because the ligase is still attached to the amplicon.
Another possibility (this is the one I like) is that the "901 bp" fragment has both adapters ligated and is running only a little larger than its expected because of the Y-adapters: 656+120=776bp double-stranded length . The 1539 bp fragment would have a double insert, 656+656+120=1432 which would fit pretty well. That would suggest the A-tailing step did not work well -- left a substantial percentage of the ends blunt.
You posted this question long ago, so you could probably update us on your results. But if you used this library, it probably worked okay. For reasons unclear to me, short amplicons seem to cluster vastly better than longer amplicons. So your data set would remain fairly free from chimerics.
--
Phillip
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|
||
Started by seqadmin, 11-22-2024, 07:04 AM
|
0 responses
76 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:04 AM
|
||
Started by seqadmin, 11-21-2024, 09:19 AM
|
0 responses
75 views
0 likes
|
Last Post
by seqadmin
11-21-2024, 09:19 AM
|
||
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
319 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
Leave a comment: