Announcement

Collapse
No announcement yet.

NovaSeq from Illumina

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nucacidhunter
    replied
    Maybe I am not using the correct terminology but by processivity I meant the polymerisation speed. For instance, some brands will extend a primer 1kb/min while others can do 3kb/min. Speedy polymerases specially with activity at suboptimal temperatures tend to extend less complimentary primers because the extension progresses before weakly bound unstable primers dissociates.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by nucacidhunter View Post
    Illumina’s white paper on index hopping https://www.illumina.com/content/dam...inkId=36607862 shows that adding adapters not used in library prep increases index hopping with increased spike in of adapters.
    Yes.
    Originally posted by nucacidhunter View Post
    These adapters will be dissociated to single stranded oligos during denaturing.
    I agree.
    Originally posted by nucacidhunter View Post
    The oligos will be complementary to adapted library fragments in a maximum stretch of ~30 nt just before the adapter index sequences which indicates that index hopping can occur when relatively large overhang is present.
    Yes.
    Originally posted by nucacidhunter View Post
    I am not sure about how many bases need to anneal for an extension event but giving high processivity of ExAmp it might be a short stretch.
    "processivity" isn't a measure of how short an annealed segment is necessary for a polymerase to extend. Its a measure of how long a polymerase will extend.
    I don't doubt that many polymerases can extend from an oligo annealed over just a handful of bases. But an oligo annealed via a very short area of complementarity will do so with little stability unless the conditions of hybridization are such that they allow such this interaction. For example high salt concentrations can shield the negative phosphate backbone charges and thereby dampen that force which tends to tear the strands apart from one another.
    Of course it is possible to lower the stringency of primer annealing of an amplification to allow just a few bases of homology to prime an extension event. But I can't think of any reason to do so during cluster formation -- it would allow various types of undesired mis-priming events that would be very undesirable. So I would doubt that Illumina would use such conditions.

    --
    Phillip

    Leave a comment:


  • nucacidhunter
    replied
    Illumina’s white paper on index hopping https://www.illumina.com/content/dam...inkId=36607862 shows that adding adapters not used in library prep increases index hopping with increased spike in of adapters. These adapters will be dissociated to single stranded oligos during denaturing. The oligos will be complementary to adapted library fragments in a maximum stretch of ~30 nt just before the adapter index sequences which indicates that index hopping can occur when relatively large overhang is present. I am not sure about how many bases need to anneal for an extension event but giving high processivity of ExAmp it might be a short stretch.

    Chimerism will happen if the 3’ end of a fragment anneals to other fragments and is extended so fragments with adapters at both ends even with high similarity will not cause cause fusion. For my hypothesized mechanism then PCR-free libraries will be more prone to index hopping and chimerism. Indeed, Illumina data https://www.illumina.com/science/edu...x-hopping.html indicates higher index hopping for PCR-free libraries but they have not investigated chimerism events.

    Index hopping is possible to happen on the flow cell tethered fragments but they would contribute if they seed another well on the flow cell. Wells with chimeras and multiple indices will have low quality sequences and more likely will be filtered in read processing steps.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by nucacidhunter View Post
    Index hopping is the result of annealed oligo extension by ExAmp. I do not know the details of ExAmp but KAPA HiFi polymerase under stringent cycling condition is able to extend primers as long as the 3’ base and other 6 bases in the 10 base region of 3’ is complementary even though the rest of the oligo is not a match and just hangs off the template.

    Left over adapter oligos, PCR primers, single-adapted and non-adapted fragments can act as oligo and result in index hopping, neutral and cluster forming fusion fragments, respectively. So presence of high concentration of oligos acting as primers and longer incubation of library pool will increase these artifacts. I also would expect to see more fusion with PCR-free libraries as the proportion of fragments without adapters in both end are higher in comparison to PCR amplified libraries.
    I hope not! That would also tend to create massive amounts of chimerism due to repetitive elements in genomic DNA, for instance. Hopefully whomever designed ExAmp would not allow low-stringency interactions of the sort you describe for the KAPA "HiFi" polymerase to result in this sort of (undesired) recombination.

    I'm not really following why we need to posit either low stringency annealing event nor actual ligations (as the mechanism described in Brian's post would require) to explain index hopping. If there is any amplification occurring anywhere but tethered to the surface of the flowcell, then unincorporated adapter oligos could anneal and be extended, creating a "cross-over event" that would generate an index hopped library molecule. If that molecule seeded a cluster, then we would have an index hop.

    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    Originally posted by Brian Bushnell View Post
    We have several theories for what was driving this on HiSeq... the most plausible being something like, "library A had too many unincorporated adapters,
    Yes, is likely to be an issue.
    Originally posted by Brian Bushnell View Post
    library B had too many adapter-free inserts, and after mixing them, library B adopted some of the free adapters from library A".
    No, adapter-free inserts will not be joined with unicorporated adapters without the intervention of a ligase.

    Remember, DNA can be converted back and forth from single-stranded to double-stranded without the intervention of any enzyme if the right temperature/salt/concentration is present. The hydrogen-bond-guided interactions between the bases of reverse-complementary strands of DNA are reversible under these conditions.

    The process of breaking the phophodiester/ribose backbone requires much more energy. Joining DNA strands via their backbone pretty much requires an enzyme.
    Originally posted by Brian Bushnell View Post
    Which would indicate that it involves both the donor and recipient library. But I'm not sure if that mechanism is important for NovaSeq.
    Probably the same. Seems like the only major difference is that you don't have to add the Ex-Amp glop to your denatured sample when using the NovaSeq. That happens in the instrument.

    --
    Phillip

    Leave a comment:


  • GW_OK
    replied
    There've been a few hypotheses that ExAmp is actually Recombinase Polymerase Amplification (RPA), developed by TwistDX.

    Here's a Youtube video describing it

    It makes sense to me. And is semi-described in one of Illumina's patents that James Hadfield reviewed on his blog.

    Leave a comment:


  • nucacidhunter
    replied
    Index hopping is the result of annealed oligo extension by ExAmp. I do not know the details of ExAmp but KAPA HiFi polymerase under stringent cycling condition is able to extend primers as long as the 3’ base and other 6 bases in the 10 base region of 3’ is complementary even though the rest of the oligo is not a match and just hangs off the template.

    Left over adapter oligos, PCR primers, single-adapted and non-adapted fragments can act as oligo and result in index hopping, neutral and cluster forming fusion fragments, respectively. So presence of high concentration of oligos acting as primers and longer incubation of library pool will increase these artifacts. I also would expect to see more fusion with PCR-free libraries as the proportion of fragments without adapters in both end are higher in comparison to PCR amplified libraries.

    Leave a comment:


  • cement_head
    replied
    Originally posted by Brian Bushnell View Post
    This is kind of tangential to NovaSeq, but...

    I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.

    I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.
    This whole problem is starting to make more and more sense to me now. Just enough sloppiness at each step probably contributes to a perfect storm of IH (Index Hopping). And given that the MiSeq/HiSeq2500 system wasn't as sensitive to these issues, it is believable that we've all picked up bad habits.

    Leave a comment:


  • cement_head
    replied
    Originally posted by pmiguel View Post
    The recommended method to detect an index swap is to use "Unique Dual Indexes". With these you don't use the same i7 index in multiple pairs. A given i7 index always goes with a fixed i5 index for the run. Then if you detect an i7 index with any i5 index other than its pair, you know an index hop has occurred and the reads are discarded.

    This will remove all index hops the result of a single recombination event. It will also remove nearly all the double recombinations. So true index hops should be largely detectable.

    As to what causes index hopping, I don't think that Illumina is sure. They seem mainly to have a list of "best practices" to use to lower their frequency.

    I haven't looked in detail at the process of exclusion amplification either. But I presume that it involves some non-flowcell-tethered PCR amplification.

    --
    Phillip
    Okay, this is interesting and jives with their basic premise. This is also contradictory to their NEXTERA i5/i7 design wherein Index codes are re-used multiple times.

    Leave a comment:


  • nucacidhunter
    replied
    Originally posted by pmiguel View Post
    Yeah, sounds reasonable. But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both? Illumina is saying that the index donor library definitely plays a role when said library includes unincorporated adapters and/or adapter dimers.

    This seems like a really high rate of recombination, no? Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.
    Any oligo (PCR primer, adapter oligos, single-end adapted or no-adapted DNA fragment) which can pair with a library fragment in 3’ end could be extended by ExAmp polymerase causing index hoping (pairing indexed adapter oligo) or chimera formation (single-end or non-adapted DNA fragment). I would expect to see more chimera in PCR-free libraries because they contain high proportion of single-end or non-adapted fragments. Although non-adapted fragments have to go through at least 2 cycles to produce a cluster forming fragment.

    I do not have any information about the length of matched region required to be extended with ExAmp mix polymerase but with KAPA HiFi a 3’ base match and 6 more in any position at the 10 base of primer 3’ end was enough to be extended even under stringent cycling condition.

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by pmiguel View Post
    But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both?
    We have several theories for what was driving this on HiSeq... the most plausible being something like, "library A had too many unincorporated adapters, library B had too many adapter-free inserts, and after mixing them, library B adopted some of the free adapters from library A". Which would indicate that it involves both the donor and recipient library. But I'm not sure if that mechanism is important for NovaSeq.

    This seems like a really high rate of recombination, no?
    Well, it's higher than what I observed for single-index libraries on our NovaSeq, but not by a huge amount.

    Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.
    I have not examined this on the NovaSeq yet, but I saw a much higher (several fold increase) of chimeric pairs when examining problematic reads on HiSeq. I don't remember the exact details; it might have been that reads mapped as improper pairs had a much higher rate of invalid barcode combinations, or vice-versa.[/QUOTE]

    Leave a comment:


  • pmiguel
    replied
    Originally posted by Brian Bushnell View Post
    You have 4.5 billion reads, and expect to detect contamination from 11% of the data (0.5B/4B)
    Yeah, sounds reasonable. But I guess there is still the question of whether the index hop derives from a characteristic of the donor library, the recipient library or both? Illumina is saying that the index donor library definitely plays a role when said library includes unincorporated adapters and/or adapter dimers.

    This seems like a really high rate of recombination, no? Do you detect an increase in chimeric inserts? Depending on the mechanism of recombination you stipulate, there might be recombination events at any stretch of similar sequence, not just in the adapters.

    Originally posted by Brian Bushnell View Post
    at a 90%-100% rate (alignment sensitivity) by observing 89% of data volume (4B/4.5B). So you should expect to detect .11*.89*(.9 to 1) = 8.8% to 9.8% of the total contamination. So, 2000 PPM observed would suggest 20400 PPM to 22700 PPM of actual cross-contamination, with a sufficiently high degree of multiplexing.

    Bear in mind, though, that mouse contamination can come from other sources, and different index pairs have different rates of cross-contamination.
    These were run as single indexes. But there may be different rates, yes.

    I checked the HiSeq run for these environmental samples and we detected 0/1000 reads mouse hits for all 21 of the data sets.

    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    I forgot to mention -- IDT has Illumina Unique Dual Indexes -- a set of 96 adapters for sale. Once we have those we can split an S2 run 96 ways an be able to detect index swaps.

    What are the HiSeq 3000/4000 instrument users doing? Kind of horrifying if upwards of 2% of reads have been mis-assigned since that instrument started being used.

    --
    Phillip

    Leave a comment:


  • Brian Bushnell
    replied
    You have 4.5 billion reads, and expect to detect contamination from 11% of the data (0.5B/4B) at a 90%-100% rate (alignment sensitivity) by observing 89% of data volume (4B/4.5B). So you should expect to detect .11*.89*(.9 to 1) = 8.8% to 9.8% of the total contamination. So, 2000 PPM observed would suggest 20400 PPM to 22700 PPM of actual cross-contamination, with a sufficiently high degree of multiplexing.

    Bear in mind, though, that mouse contamination can come from other sources, and different index pairs have different rates of cross-contamination.

    Leave a comment:


  • pmiguel
    replied
    Hmm, we just finished processing our first (training) NovaSeq run and I am seeing evidence of index hops at about 2000PPM (0.2%). Or is it 1.6%?

    We ran 21 (non-mouse) fecal DNA environmental samples (no-PCR libraries, made using the 550 bp method with the TruSeq no amp kit) and 3 mouse RNAseq (Illumina TruSeq polyA+) libraries. All just using single indexes.

    The assay we used to detect index hops in imperfect -- 1000 reads from each sample were blasted against genbank and software attempts to determine the species origin based on the blast search.

    Works better for some species than others. For mouse RNA, generally >90% of reads come back identified as "mus musculus". But for sorghum genomic DNA, only about 50% of the reads come back identified as sorghum.

    But, nevertheless I expect that >90% of mouse reads hopping into a non-mouse sample bin would be detected. In the 21 DNA library files we detected a range of 0-6 reads called by the software as "mus musculus" and that averages to 2% across 21 samples.

    Not sure how to scale this though. There were a total of 24 samples, 21 environmental, 3 mouse RNA. The run demultiplexed to 4 billion environmental clusters and 0.5 billion mouse RNA sample clusters. In the 4 billion environmental reads 0.2% are mouse. So is that 0.2% index hopping rate? Or because there were 1/8th the number of clustered mouse amplicons as environmental amplicons should I multiply that figure by 8?

    To get a mouse read in an environmental sample, it would be necessary for an index to be "donated" from a mouse sample to an environmental amplicon. In the end I only care to use the mouse sequence to identify the percentage of reads mis-assigned overall.

    Okay, generally one is cautioned to move into numbers if percentages are misleading. 0.2% of 4 billion clusters 8x10^6 or 8 million mis-assigned clusters for the run. Those are the events I can detect. How many non-detected events would I project? Yeah, probably 1.6%.

    These were made to run on the HiSeq (and they were).

    --
    Phillip

    Leave a comment:

Working...
X