Hmm, we just finished processing our first (training) NovaSeq run and I am seeing evidence of index hops at about 2000PPM (0.2%). Or is it 1.6%?
We ran 21 (non-mouse) fecal DNA environmental samples (no-PCR libraries, made using the 550 bp method with the TruSeq no amp kit) and 3 mouse RNAseq (Illumina TruSeq polyA+) libraries. All just using single indexes.
The assay we used to detect index hops in imperfect -- 1000 reads from each sample were blasted against genbank and software attempts to determine the species origin based on the blast search.
Works better for some species than others. For mouse RNA, generally >90% of reads come back identified as "mus musculus". But for sorghum genomic DNA, only about 50% of the reads come back identified as sorghum.
But, nevertheless I expect that >90% of mouse reads hopping into a non-mouse sample bin would be detected. In the 21 DNA library files we detected a range of 0-6 reads called by the software as "mus musculus" and that averages to 2% across 21 samples.
Not sure how to scale this though. There were a total of 24 samples, 21 environmental, 3 mouse RNA. The run demultiplexed to 4 billion environmental clusters and 0.5 billion mouse RNA sample clusters. In the 4 billion environmental reads 0.2% are mouse. So is that 0.2% index hopping rate? Or because there were 1/8th the number of clustered mouse amplicons as environmental amplicons should I multiply that figure by 8?
To get a mouse read in an environmental sample, it would be necessary for an index to be "donated" from a mouse sample to an environmental amplicon. In the end I only care to use the mouse sequence to identify the percentage of reads mis-assigned overall.
Okay, generally one is cautioned to move into numbers if percentages are misleading. 0.2% of 4 billion clusters 8x10^6 or 8 million mis-assigned clusters for the run. Those are the events I can detect. How many non-detected events would I project? Yeah, probably 1.6%.
These were made to run on the HiSeq (and they were).
--
Phillip
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by Brian Bushnell View PostThis is kind of tangential to NovaSeq, but...
I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.
I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.
But I guess previous instantiations of ex-amp (HiSeq 4000/X) require the researcher to mix the "ex amp" reagent with the library pool prior to clustering on the cbot. If this reagent contains the polymerase and other reactants then it could indeed be responsible for the recommendation not to leave pools sitting around at room temp or at all.
The NovaSeq does only on-board clustering and so adds the ex-amp reagents to the denatured library pool itself. So the "letting libraries sit around as pool prohibition" should not be an issue for it. If this is one of the mechanisms of index-hopping...
--
Phillip
Leave a comment:
-
This is kind of tangential to NovaSeq, but...
I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.
I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.Last edited by Brian Bushnell; 07-21-2017, 06:34 AM.
Leave a comment:
-
My understanding is that index hopping can happen any time in the pool which contains single stranded library fragments, a partially complementary oligo (from PCR or adapter oligos) that can pair with a strand and ExAmp reagents. Amplification is isothermal and is at optimum in the temperature maintained during clustering but like most polymerase there should be some low level activity in non-optimal temperatures as well. These are the reasons that preparing pool just prior to loading and keeping on ice is highly recommended.
Leave a comment:
-
The recommended method to detect an index swap is to use "Unique Dual Indexes". With these you don't use the same i7 index in multiple pairs. A given i7 index always goes with a fixed i5 index for the run. Then if you detect an i7 index with any i5 index other than its pair, you know an index hop has occurred and the reads are discarded.
This will remove all index hops the result of a single recombination event. It will also remove nearly all the double recombinations. So true index hops should be largely detectable.
As to what causes index hopping, I don't think that Illumina is sure. They seem mainly to have a list of "best practices" to use to lower their frequency.
I haven't looked in detail at the process of exclusion amplification either. But I presume that it involves some non-flowcell-tethered PCR amplification.
--
Phillip
Leave a comment:
-
Originally posted by nucacidhunter View PostExclusion Amplification (ExAmp) has been explained in the following video.
https://www.youtube.com/watch?v=pfZp5Vgsbw0
Following is the link for the patent:
https://www.google.com.au/patents/WO2013188582A1?cl=en
I now understand the need for (a) super-clean libraries and (b) size optimised libraries - to beat the "average" diffusion rate(s) on these HiSeq3000/4000/X/NovaSeq platforms.
Here's the real question: how does one detect index swapped (hopped) reads? Do you have to have a reference? It would seem that the answer would be "yes", or as Illumina suggests in their white paper, one has to a priori have an idea of the expression levels/targets?
Leave a comment:
-
Exclusion Amplification (ExAmp) has been explained in the following video.
https://www.youtube.com/watch?v=pfZp5Vgsbw0
Following is the link for the patent:
https://www.google.com.au/patents/WO2013188582A1?cl=en
Leave a comment:
-
Originally posted by GenoMax View PostSee Illumina's white paper on index hopping here.
Aside: Hard to believe this has been going on this long and Illumina has been largely silent about this - one would think they would have issued a protocol change for ONLY dual-index libraries on nanocell instruments.
Leave a comment:
-
Does INDEX swapping (hopping) occur because this (release and re-annealing) is the method for generating clusters within each nanocell, and the swapping is as the result of the DNA fragments (library frags) inadvertently jumping/hopping too far into the next nanocell?
Leave a comment:
-
Originally posted by pmiguel View PostWhat went into that 8000 PPM (0.8%) calculation Brian? I mean, did you just count the number of swaps in a dual unique indexed run?
Anyone checked that figure for a HiSeq 2500 run? I know no one is complaining about index hopping on that instrument or a MiSeq, but it would happen at some rate.
--
Phillip
Code:seal.sh in=reads.fq stats=stats.txt ambig=toss clearzone=10
I also tested a HiSeq run of the same E.coli library and calculated a 7 PPM contamination rate, but that's not really credible since I don't know what else was present on the plate in that run so I don't necessarily have the correct references (though there was definitely some Chlamy present). In the past I've seen various rates of cross contamination in HiSeq 2500 (<1PPM to >1000PPM) and it's actually quite hard to consistently reproduce the same numbers on different runs. The cross contamination comes from various sources, including physical contamination, though I think we've eliminated physical in our cross contamination current processes. NextSeq has generally yielded lower rates of cross contamination compared to HiSeq 2500 so we use that for our multiplexed single cells even though the quality is lower than HiSeq.Last edited by Brian Bushnell; 07-14-2017, 09:34 AM.
Leave a comment:
-
Originally posted by Brian Bushnell View PostI calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
Leave a comment:
-
Originally posted by GenoMax View PostThe important point is JGI probably made VERY GOOD quality libraries. With patterned FC's having clean libraries (with just the right sized inserts, zero primers and dimers) are critical to minimizing these issues. Since we are talking about "B"illions of reads losing some during dedupe should not cause a major loss. 2D barcoding seems essential (perhaps should be made mandatory).
--
Phillip
Leave a comment:
-
Originally posted by Brian Bushnell View PostI calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
Anyone checked that figure for a HiSeq 2500 run? I know no one is complaining about index hopping on that instrument or a MiSeq, but it would happen at some rate.
--
Phillip
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: