Seqanswers Leaderboard Ad

**cement_head** · 07-26-2017, 12:49 PM

Originally posted by pmiguel View Post

The recommended method to detect an index swap is to use "Unique Dual Indexes". With these you don't use the same i7 index in multiple pairs. A given i7 index always goes with a fixed i5 index for the run. Then if you detect an i7 index with any i5 index other than its pair, you know an index hop has occurred and the reads are discarded.

This will remove all index hops the result of a single recombination event. It will also remove nearly all the double recombinations. So true index hops should be largely detectable.

As to what causes index hopping, I don't think that Illumina is sure. They seem mainly to have a list of "best practices" to use to lower their frequency.

I haven't looked in detail at the process of exclusion amplification either. But I presume that it involves some non-flowcell-tethered PCR amplification.

--
Phillip

Okay, this is interesting and jives with their basic premise. This is also contradictory to their NEXTERA i5/i7 design wherein Index codes are re-used multiple times.

**cement_head** · 07-26-2017, 12:55 PM

Originally posted by Brian Bushnell View Post

This is kind of tangential to NovaSeq, but...

I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.

I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.

This whole problem is starting to make more and more sense to me now. Just enough sloppiness at each step probably contributes to a perfect storm of IH (Index Hopping). And given that the MiSeq/HiSeq2500 system wasn't as sensitive to these issues, it is believable that we've all picked up bad habits.

**nucacidhunter** · 07-27-2017, 01:17 AM

Index hopping is the result of annealed oligo extension by ExAmp. I do not know the details of ExAmp but KAPA HiFi polymerase under stringent cycling condition is able to extend primers as long as the 3’ base and other 6 bases in the 10 base region of 3’ is complementary even though the rest of the oligo is not a match and just hangs off the template.

Left over adapter oligos, PCR primers, single-adapted and non-adapted fragments can act as oligo and result in index hopping, neutral and cluster forming fusion fragments, respectively. So presence of high concentration of oligos acting as primers and longer incubation of library pool will increase these artifacts. I also would expect to see more fusion with PCR-free libraries as the proportion of fragments without adapters in both end are higher in comparison to PCR amplified libraries.

**GW_OK** · 08-01-2017, 06:19 AM

There've been a few hypotheses that ExAmp is actually Recombinase Polymerase Amplification (RPA), developed by TwistDX.

Here's a Youtube video describing it

It makes sense to me. And is semi-described in one of Illumina's patents that James Hadfield reviewed on his blog.

**pmiguel** · 08-01-2017, 01:21 PM

Originally posted by Brian Bushnell View Post

We have several theories for what was driving this on HiSeq... the most plausible being something like, "library A had too many unincorporated adapters,

Yes, is likely to be an issue.

Originally posted by Brian Bushnell View Post

library B had too many adapter-free inserts, and after mixing them, library B adopted some of the free adapters from library A".

No, adapter-free inserts will not be joined with unicorporated adapters without the intervention of a ligase.

Remember, DNA can be converted back and forth from single-stranded to double-stranded without the intervention of any enzyme if the right temperature/salt/concentration is present. The hydrogen-bond-guided interactions between the bases of reverse-complementary strands of DNA are reversible under these conditions.

The process of breaking the phophodiester/ribose backbone requires much more energy. Joining DNA strands via their backbone pretty much requires an enzyme.

Originally posted by Brian Bushnell View Post

Which would indicate that it involves both the donor and recipient library. But I'm not sure if that mechanism is important for NovaSeq.

Probably the same. Seems like the only major difference is that you don't have to add the Ex-Amp glop to your denatured sample when using the NovaSeq. That happens in the instrument.

--
Phillip

**pmiguel** · 08-01-2017, 01:35 PM

Originally posted by nucacidhunter View Post

Index hopping is the result of annealed oligo extension by ExAmp. I do not know the details of ExAmp but KAPA HiFi polymerase under stringent cycling condition is able to extend primers as long as the 3’ base and other 6 bases in the 10 base region of 3’ is complementary even though the rest of the oligo is not a match and just hangs off the template.

Left over adapter oligos, PCR primers, single-adapted and non-adapted fragments can act as oligo and result in index hopping, neutral and cluster forming fusion fragments, respectively. So presence of high concentration of oligos acting as primers and longer incubation of library pool will increase these artifacts. I also would expect to see more fusion with PCR-free libraries as the proportion of fragments without adapters in both end are higher in comparison to PCR amplified libraries.

I hope not! That would also tend to create massive amounts of chimerism due to repetitive elements in genomic DNA, for instance. Hopefully whomever designed ExAmp would not allow low-stringency interactions of the sort you describe for the KAPA "HiFi" polymerase to result in this sort of (undesired) recombination.

I'm not really following why we need to posit either low stringency annealing event nor actual ligations (as the mechanism described in Brian's post would require) to explain index hopping. If there is any amplification occurring anywhere but tethered to the surface of the flowcell, then unincorporated adapter oligos could anneal and be extended, creating a "cross-over event" that would generate an index hopped library molecule. If that molecule seeded a cluster, then we would have an index hop.

--
Phillip

**nucacidhunter** · 08-02-2017, 01:54 AM

Illumina’s white paper on index hopping https://www.illumina.com/content/dam...inkId=36607862 shows that adding adapters not used in library prep increases index hopping with increased spike in of adapters. These adapters will be dissociated to single stranded oligos during denaturing. The oligos will be complementary to adapted library fragments in a maximum stretch of ~30 nt just before the adapter index sequences which indicates that index hopping can occur when relatively large overhang is present. I am not sure about how many bases need to anneal for an extension event but giving high processivity of ExAmp it might be a short stretch.

Chimerism will happen if the 3’ end of a fragment anneals to other fragments and is extended so fragments with adapters at both ends even with high similarity will not cause cause fusion. For my hypothesized mechanism then PCR-free libraries will be more prone to index hopping and chimerism. Indeed, Illumina data https://www.illumina.com/science/edu...x-hopping.html indicates higher index hopping for PCR-free libraries but they have not investigated chimerism events.

Index hopping is possible to happen on the flow cell tethered fragments but they would contribute if they seed another well on the flow cell. Wells with chimeras and multiple indices will have low quality sequences and more likely will be filtered in read processing steps.

**pmiguel** · 08-03-2017, 03:22 AM

Originally posted by nucacidhunter View Post

Illumina’s white paper on index hopping https://www.illumina.com/content/dam...inkId=36607862 shows that adding adapters not used in library prep increases index hopping with increased spike in of adapters.

Yes.

Originally posted by nucacidhunter View Post

These adapters will be dissociated to single stranded oligos during denaturing.

I agree.

Originally posted by nucacidhunter View Post

The oligos will be complementary to adapted library fragments in a maximum stretch of ~30 nt just before the adapter index sequences which indicates that index hopping can occur when relatively large overhang is present.

Yes.

Originally posted by nucacidhunter View Post

I am not sure about how many bases need to anneal for an extension event but giving high processivity of ExAmp it might be a short stretch.

"processivity" isn't a measure of how short an annealed segment is necessary for a polymerase to extend. Its a measure of how long a polymerase will extend.
I don't doubt that many polymerases can extend from an oligo annealed over just a handful of bases. But an oligo annealed via a very short area of complementarity will do so with little stability unless the conditions of hybridization are such that they allow such this interaction. For example high salt concentrations can shield the negative phosphate backbone charges and thereby dampen that force which tends to tear the strands apart from one another.
Of course it is possible to lower the stringency of primer annealing of an amplification to allow just a few bases of homology to prime an extension event. But I can't think of any reason to do so during cluster formation -- it would allow various types of undesired mis-priming events that would be very undesirable. So I would doubt that Illumina would use such conditions.

--
Phillip

**nucacidhunter** · 08-03-2017, 03:51 AM

Maybe I am not using the correct terminology but by processivity I meant the polymerisation speed. For instance, some brands will extend a primer 1kb/min while others can do 3kb/min. Speedy polymerases specially with activity at suboptimal temperatures tend to extend less complimentary primers because the extension progresses before weakly bound unstable primers dissociates.

**nano85** · 04-23-2018, 01:06 AM

Large insert size on NovaSeq

Originally posted by Brian Bushnell View Post

I did a comparison of duplicate rates on HiSeq2500 and NovaSeq, using Illumina's public data on BaseSpace:

NovaSeq seems to have a problem, but it's not clear why. These are not normal optical/well duplicates; they are extremely remote. It looks like during colony formation, some reads break off and reattach to an empty well somewhere else. The farthest-right point (at 25000) is not for distance 25000 but for distance infinity, including inter-tile duplicates.

These libraries are PCR-free WGS and thus should not really have more than a tiny fraction of duplicates, as seen on the HiSeq. Does anyone have any idea what's causing this? Does my hypothesis sound reasonable? Previous Illumina platforms had a very obvious distance cutoff where the number of duplicates increases rapidly up to a point, then plateaus (which is true for this HiSeq data, at around dist=45, but you can't see it in this graph). That is not the case for NovaSeq - it just keeps ascending, and there is no clear cutoff. It gradually bends, so there is no clear inflection point like there is on other platforms.

For reference, the libraries are both human NA12878 runs. NovaSeq is 2x150 and HiSeq 2500 is 2x100. Pairs are considered duplicates when the distance between colony centers is at most the stated distance, and both R1 and R2 match with some number of substitutions allowed, to account for sequencing error (8 for 150bp reads and 5 for 100bp reads). The insert sizes are quite large on average (>500bp) which reduces the rate of coincidental duplicates. HS2500 is ~10x and NovaSeq is ~30x coverage so the coincidental duplicate rate should be extremely low in both cases.

P.S. This is an underestimate of the duplicate rate for both platforms, as it was generated in a way that is not robust to sequencing error. I will regenerate the data, but it won't change the discrepancy, just the magnitude.

Hi Brian,

This is a bit off-topic, but your comment was the only thing I could find online about sequencing libraries with larger-than-recommended library fragment sizes on the NovaSeq. I prepared a TruSeq Nano library with unusually large fragment sizes (BioAnalyzer trace is attached), which I've done before and sequenced without issue on the MiSeq (seems to be a quirk of my setup as I follow Illumina protocol as closely as I can). This time I'd like to sequence on the NovaSeq, and I noticed this in the Illumina bulletin on migrating libraries between different sequencing platforms:
"Some applications with 550 bp or greater insert sizes are compatible with the NovaSeq platform, but additional optimization steps may be required."

Are you familiar with the optimization steps that they refer to? I am also in contact with Illumina technical support here in the UK regarding this and they are warning me that I should re-prep the library because I will get overclustering and rubbish results, but I'd rather not of course, especially considering that I successfully sequenced a library with nearly identical out-of-spec fragment size on the MiSeq. I imagine you might know a thing or two about this?

Attached Files

11318_Bioanalyser_report.pdf (178.8 KB, 0 views)

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, 11-08-2024, 11:09 AM	0 responses 35 views 0 likes	Last Post by seqadmin 11-08-2024, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, 11-08-2024, 06:13 AM	0 responses 28 views 0 likes	Last Post by seqadmin 11-08-2024, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 32 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 23 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News