Seqanswers Leaderboard Ad

**cement_head** · 07-18-2017, 07:23 AM

Does INDEX swapping (hopping) occur because this (release and re-annealing) is the method for generating clusters within each nanocell, and the swapping is as the result of the DNA fragments (library frags) inadvertently jumping/hopping too far into the next nanocell?

**GenoMax** · 07-18-2017, 08:03 AM

See Illumina's white paper on index hopping here.

**cement_head** · 07-18-2017, 11:24 AM

Originally posted by GenoMax View Post

See Illumina's white paper on index hopping here.

Still don't understand the ExAMP chemistry, and unless I missed something, this white paper doesn't explain it. Is it that it is proprietary and largely unknown? Would you happen to have a link to where it is explained? Thanks.

Aside: Hard to believe this has been going on this long and Illumina has been largely silent about this - one would think they would have issued a protocol change for ONLY dual-index libraries on nanocell instruments.

**nucacidhunter** · 07-18-2017, 07:49 PM

Exclusion Amplification (ExAmp) has been explained in the following video.
https://www.youtube.com/watch?v=pfZp5Vgsbw0

Following is the link for the patent:
https://www.google.com.au/patents/WO2013188582A1?cl=en

**cement_head** · 07-20-2017, 09:46 AM

Originally posted by nucacidhunter View Post

Exclusion Amplification (ExAmp) has been explained in the following video.
https://www.youtube.com/watch?v=pfZp5Vgsbw0

Following is the link for the patent:
https://www.google.com.au/patents/WO2013188582A1?cl=en

The video wasn't overly helpful, but if I understand the patent description, they're saying that they've essentially hyper optimised the bridge amplification such that once a single seed molecule binds within a nanowell, after 14 rounds, it will dominate the signal during SBS? If true, then it must be within the first two rounds that the seed molecules drift from one nanowell to the next (this is the average transport vs average amplification rate that they constantly cite in the patent). Or, does the mispriming occur PRIOR to the initial hybridisation to the nanowell, and before the first round of bridge ampification?

I now understand the need for (a) super-clean libraries and (b) size optimised libraries - to beat the "average" diffusion rate(s) on these HiSeq3000/4000/X/NovaSeq platforms.

Here's the real question: how does one detect index swapped (hopped) reads? Do you have to have a reference? It would seem that the answer would be "yes", or as Illumina suggests in their white paper, one has to a priori have an idea of the expression levels/targets?

**pmiguel** · 07-20-2017, 12:34 PM

The recommended method to detect an index swap is to use "Unique Dual Indexes". With these you don't use the same i7 index in multiple pairs. A given i7 index always goes with a fixed i5 index for the run. Then if you detect an i7 index with any i5 index other than its pair, you know an index hop has occurred and the reads are discarded.

This will remove all index hops the result of a single recombination event. It will also remove nearly all the double recombinations. So true index hops should be largely detectable.

As to what causes index hopping, I don't think that Illumina is sure. They seem mainly to have a list of "best practices" to use to lower their frequency.

I haven't looked in detail at the process of exclusion amplification either. But I presume that it involves some non-flowcell-tethered PCR amplification.

--
Phillip

**nucacidhunter** · 07-21-2017, 01:38 AM

My understanding is that index hopping can happen any time in the pool which contains single stranded library fragments, a partially complementary oligo (from PCR or adapter oligos) that can pair with a strand and ExAmp reagents. Amplification is isothermal and is at optimum in the temperature maintained during clustering but like most polymerase there should be some low level activity in non-optimal temperatures as well. These are the reasons that preparing pool just prior to loading and keeping on ice is highly recommended.

**Brian Bushnell** · 07-21-2017, 06:20 AM

This is kind of tangential to NovaSeq, but...

I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.

I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.

**pmiguel** · 07-21-2017, 07:15 AM

Originally posted by Brian Bushnell View Post

This is kind of tangential to NovaSeq, but...

I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.

I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.

Yeah, I'm more of a bench scientist by background. And until I saw nucacidhunter's post above I hadn't seen any plausible mechanism as to how purified Illumina amplicon libraries would "swap indexes" due to sitting around mixed together. That is, under normal conditions DNA is very nearly inert and stable. It doesn't recombine without the help of enzyme(s).

But I guess previous instantiations of ex-amp (HiSeq 4000/X) require the researcher to mix the "ex amp" reagent with the library pool prior to clustering on the cbot. If this reagent contains the polymerase and other reactants then it could indeed be responsible for the recommendation not to leave pools sitting around at room temp or at all.

The NovaSeq does only on-board clustering and so adds the ex-amp reagents to the denatured library pool itself. So the "letting libraries sit around as pool prohibition" should not be an issue for it. If this is one of the mechanisms of index-hopping...

--
Phillip

**pmiguel** · 07-24-2017, 10:09 AM

Hmm, we just finished processing our first (training) NovaSeq run and I am seeing evidence of index hops at about 2000PPM (0.2%). Or is it 1.6%?

We ran 21 (non-mouse) fecal DNA environmental samples (no-PCR libraries, made using the 550 bp method with the TruSeq no amp kit) and 3 mouse RNAseq (Illumina TruSeq polyA+) libraries. All just using single indexes.

The assay we used to detect index hops in imperfect -- 1000 reads from each sample were blasted against genbank and software attempts to determine the species origin based on the blast search.

Works better for some species than others. For mouse RNA, generally >90% of reads come back identified as "mus musculus". But for sorghum genomic DNA, only about 50% of the reads come back identified as sorghum.

But, nevertheless I expect that >90% of mouse reads hopping into a non-mouse sample bin would be detected. In the 21 DNA library files we detected a range of 0-6 reads called by the software as "mus musculus" and that averages to 2% across 21 samples.

Not sure how to scale this though. There were a total of 24 samples, 21 environmental, 3 mouse RNA. The run demultiplexed to 4 billion environmental clusters and 0.5 billion mouse RNA sample clusters. In the 4 billion environmental reads 0.2% are mouse. So is that 0.2% index hopping rate? Or because there were 1/8th the number of clustered mouse amplicons as environmental amplicons should I multiply that figure by 8?

To get a mouse read in an environmental sample, it would be necessary for an index to be "donated" from a mouse sample to an environmental amplicon. In the end I only care to use the mouse sequence to identify the percentage of reads mis-assigned overall.

Okay, generally one is cautioned to move into numbers if percentages are misleading. 0.2% of 4 billion clusters 8x10^6 or 8 million mis-assigned clusters for the run. Those are the events I can detect. How many non-detected events would I project? Yeah, probably 1.6%.

These were made to run on the HiSeq (and they were).

--
Phillip

**Brian Bushnell** · 07-24-2017, 10:28 AM

You have 4.5 billion reads, and expect to detect contamination from 11% of the data (0.5B/4B) at a 90%-100% rate (alignment sensitivity) by observing 89% of data volume (4B/4.5B). So you should expect to detect .11*.89*(.9 to 1) = 8.8% to 9.8% of the total contamination. So, 2000 PPM observed would suggest 20400 PPM to 22700 PPM of actual cross-contamination, with a sufficiently high degree of multiplexing.

Bear in mind, though, that mouse contamination can come from other sources, and different index pairs have different rates of cross-contamination.

**pmiguel** · 07-24-2017, 10:59 AM

I forgot to mention -- IDT has Illumina Unique Dual Indexes -- a set of 96 adapters for sale. Once we have those we can split an S2 run 96 ways an be able to detect index swaps.

What are the HiSeq 3000/4000 instrument users doing? Kind of horrifying if upwards of 2% of reads have been mis-assigned since that instrument started being used.

--
Phillip

**pmiguel** · 07-24-2017, 11:22 AM

Originally posted by Brian Bushnell View Post

You have 4.5 billion reads, and expect to detect contamination from 11% of the data (0.5B/4B)

Yeah, sounds reasonable. But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both? Illumina is saying that the index donor library definitely plays a role when said library includes unincorporated adapters and/or adapter dimers.

This seems like a really high rate of recombination, no? Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.

Originally posted by Brian Bushnell View Post

at a 90%-100% rate (alignment sensitivity) by observing 89% of data volume (4B/4.5B). So you should expect to detect .11*.89*(.9 to 1) = 8.8% to 9.8% of the total contamination. So, 2000 PPM observed would suggest 20400 PPM to 22700 PPM of actual cross-contamination, with a sufficiently high degree of multiplexing.

Bear in mind, though, that mouse contamination can come from other sources, and different index pairs have different rates of cross-contamination.

These were run as single indexes. But there may be different rates, yes.

I checked the HiSeq run for these environmental samples and we detected 0/1000 reads mouse hits for all 21 of the data sets.

--
Phillip

**Brian Bushnell** · 07-24-2017, 12:28 PM

Originally posted by pmiguel View Post

But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both?

We have several theories for what was driving this on HiSeq... the most plausible being something like, "library A had too many unincorporated adapters, library B had too many adapter-free inserts, and after mixing them, library B adopted some of the free adapters from library A". Which would indicate that it involves both the donor and recipient library. But I'm not sure if that mechanism is important for NovaSeq.

This seems like a really high rate of recombination, no?

Well, it's higher than what I observed for single-index libraries on our NovaSeq, but not by a huge amount.

Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.

I have not examined this on the NovaSeq yet, but I saw a much higher (several fold increase) of chimeric pairs when examining problematic reads on HiSeq. I don't remember the exact details; it might have been that reads mapped as improper pairs had a much higher rate of invalid barcode combinations, or vice-versa.[/QUOTE]

**nucacidhunter** · 07-24-2017, 09:16 PM

Originally posted by pmiguel View Post

Yeah, sounds reasonable. But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both? Illumina is saying that the index donor library definitely plays a role when said library includes unincorporated adapters and/or adapter dimers.

This seems like a really high rate of recombination, no? Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.

Any oligo (PCR primer, adapter oligos, single-end adapted or no-adapted DNA fragment) which can pair with a library fragment in 3’ end could be extended by ExAmp polymerase causing index hoping (pairing indexed adapter oligo) or chimera formation (single-end or non-adapted DNA fragment). I would expect to see more chimera in PCR-free libraries because they contain high proportion of single-end or non-adapted fragments. Although non-adapted fragments have to go through at least 2 cycles to produce a cluster forming fragment.

I do not have any information about the length of matched region required to be extended with ExAmp mix polymerase but with KAPA HiFi a 3’ base match and 6 more in any position at the 10 base of primer 3’ end was enough to be extended even under stringent cycling condition.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News