Does INDEX swapping (hopping) occur because this (release and re-annealing) is the method for generating clusters within each nanocell, and the swapping is as the result of the DNA fragments (library frags) inadvertently jumping/hopping too far into the next nanocell?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by GenoMax View PostSee Illumina's white paper on index hopping here.
Aside: Hard to believe this has been going on this long and Illumina has been largely silent about this - one would think they would have issued a protocol change for ONLY dual-index libraries on nanocell instruments.
Comment
-
Exclusion Amplification (ExAmp) has been explained in the following video.
https://www.youtube.com/watch?v=pfZp5Vgsbw0
Following is the link for the patent:
https://www.google.com.au/patents/WO2013188582A1?cl=en
Comment
-
Originally posted by nucacidhunter View PostExclusion Amplification (ExAmp) has been explained in the following video.
https://www.youtube.com/watch?v=pfZp5Vgsbw0
Following is the link for the patent:
https://www.google.com.au/patents/WO2013188582A1?cl=en
I now understand the need for (a) super-clean libraries and (b) size optimised libraries - to beat the "average" diffusion rate(s) on these HiSeq3000/4000/X/NovaSeq platforms.
Here's the real question: how does one detect index swapped (hopped) reads? Do you have to have a reference? It would seem that the answer would be "yes", or as Illumina suggests in their white paper, one has to a priori have an idea of the expression levels/targets?
Comment
-
The recommended method to detect an index swap is to use "Unique Dual Indexes". With these you don't use the same i7 index in multiple pairs. A given i7 index always goes with a fixed i5 index for the run. Then if you detect an i7 index with any i5 index other than its pair, you know an index hop has occurred and the reads are discarded.
This will remove all index hops the result of a single recombination event. It will also remove nearly all the double recombinations. So true index hops should be largely detectable.
As to what causes index hopping, I don't think that Illumina is sure. They seem mainly to have a list of "best practices" to use to lower their frequency.
I haven't looked in detail at the process of exclusion amplification either. But I presume that it involves some non-flowcell-tethered PCR amplification.
--
Phillip
Comment
-
My understanding is that index hopping can happen any time in the pool which contains single stranded library fragments, a partially complementary oligo (from PCR or adapter oligos) that can pair with a strand and ExAmp reagents. Amplification is isothermal and is at optimum in the temperature maintained during clustering but like most polymerase there should be some low level activity in non-optimal temperatures as well. These are the reasons that preparing pool just prior to loading and keeping on ice is highly recommended.
Comment
-
This is kind of tangential to NovaSeq, but...
I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.
I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.Last edited by Brian Bushnell; 07-21-2017, 06:34 AM.
Comment
-
Originally posted by Brian Bushnell View PostThis is kind of tangential to NovaSeq, but...
I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.
I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.
But I guess previous instantiations of ex-amp (HiSeq 4000/X) require the researcher to mix the "ex amp" reagent with the library pool prior to clustering on the cbot. If this reagent contains the polymerase and other reactants then it could indeed be responsible for the recommendation not to leave pools sitting around at room temp or at all.
The NovaSeq does only on-board clustering and so adds the ex-amp reagents to the denatured library pool itself. So the "letting libraries sit around as pool prohibition" should not be an issue for it. If this is one of the mechanisms of index-hopping...
--
Phillip
Comment
-
Hmm, we just finished processing our first (training) NovaSeq run and I am seeing evidence of index hops at about 2000PPM (0.2%). Or is it 1.6%?
We ran 21 (non-mouse) fecal DNA environmental samples (no-PCR libraries, made using the 550 bp method with the TruSeq no amp kit) and 3 mouse RNAseq (Illumina TruSeq polyA+) libraries. All just using single indexes.
The assay we used to detect index hops in imperfect -- 1000 reads from each sample were blasted against genbank and software attempts to determine the species origin based on the blast search.
Works better for some species than others. For mouse RNA, generally >90% of reads come back identified as "mus musculus". But for sorghum genomic DNA, only about 50% of the reads come back identified as sorghum.
But, nevertheless I expect that >90% of mouse reads hopping into a non-mouse sample bin would be detected. In the 21 DNA library files we detected a range of 0-6 reads called by the software as "mus musculus" and that averages to 2% across 21 samples.
Not sure how to scale this though. There were a total of 24 samples, 21 environmental, 3 mouse RNA. The run demultiplexed to 4 billion environmental clusters and 0.5 billion mouse RNA sample clusters. In the 4 billion environmental reads 0.2% are mouse. So is that 0.2% index hopping rate? Or because there were 1/8th the number of clustered mouse amplicons as environmental amplicons should I multiply that figure by 8?
To get a mouse read in an environmental sample, it would be necessary for an index to be "donated" from a mouse sample to an environmental amplicon. In the end I only care to use the mouse sequence to identify the percentage of reads mis-assigned overall.
Okay, generally one is cautioned to move into numbers if percentages are misleading. 0.2% of 4 billion clusters 8x10^6 or 8 million mis-assigned clusters for the run. Those are the events I can detect. How many non-detected events would I project? Yeah, probably 1.6%.
These were made to run on the HiSeq (and they were).
--
Phillip
Comment
-
You have 4.5 billion reads, and expect to detect contamination from 11% of the data (0.5B/4B) at a 90%-100% rate (alignment sensitivity) by observing 89% of data volume (4B/4.5B). So you should expect to detect .11*.89*(.9 to 1) = 8.8% to 9.8% of the total contamination. So, 2000 PPM observed would suggest 20400 PPM to 22700 PPM of actual cross-contamination, with a sufficiently high degree of multiplexing.
Bear in mind, though, that mouse contamination can come from other sources, and different index pairs have different rates of cross-contamination.
Comment
-
I forgot to mention -- IDT has Illumina Unique Dual Indexes -- a set of 96 adapters for sale. Once we have those we can split an S2 run 96 ways an be able to detect index swaps.
What are the HiSeq 3000/4000 instrument users doing? Kind of horrifying if upwards of 2% of reads have been mis-assigned since that instrument started being used.
--
Phillip
Comment
-
Originally posted by Brian Bushnell View PostYou have 4.5 billion reads, and expect to detect contamination from 11% of the data (0.5B/4B)
This seems like a really high rate of recombination, no? Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.
Originally posted by Brian Bushnell View Postat a 90%-100% rate (alignment sensitivity) by observing 89% of data volume (4B/4.5B). So you should expect to detect .11*.89*(.9 to 1) = 8.8% to 9.8% of the total contamination. So, 2000 PPM observed would suggest 20400 PPM to 22700 PPM of actual cross-contamination, with a sufficiently high degree of multiplexing.
Bear in mind, though, that mouse contamination can come from other sources, and different index pairs have different rates of cross-contamination.
I checked the HiSeq run for these environmental samples and we detected 0/1000 reads mouse hits for all 21 of the data sets.
--
Phillip
Comment
-
Originally posted by pmiguel View PostBut I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both?
This seems like a really high rate of recombination, no?
Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.
Comment
-
Originally posted by pmiguel View PostYeah, sounds reasonable. But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both? Illumina is saying that the index donor library definitely plays a role when said library includes unincorporated adapters and/or adapter dimers.
This seems like a really high rate of recombination, no? Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.
I do not have any information about the length of matched region required to be extended with ExAmp mix polymerase but with KAPA HiFi a 3’ base match and 6 more in any position at the 10 base of primer 3’ end was enough to be extended even under stringent cycling condition.
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment