Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pmiguel
    replied
    Hmm, we just finished processing our first (training) NovaSeq run and I am seeing evidence of index hops at about 2000PPM (0.2%). Or is it 1.6%?

    We ran 21 (non-mouse) fecal DNA environmental samples (no-PCR libraries, made using the 550 bp method with the TruSeq no amp kit) and 3 mouse RNAseq (Illumina TruSeq polyA+) libraries. All just using single indexes.

    The assay we used to detect index hops in imperfect -- 1000 reads from each sample were blasted against genbank and software attempts to determine the species origin based on the blast search.

    Works better for some species than others. For mouse RNA, generally >90% of reads come back identified as "mus musculus". But for sorghum genomic DNA, only about 50% of the reads come back identified as sorghum.

    But, nevertheless I expect that >90% of mouse reads hopping into a non-mouse sample bin would be detected. In the 21 DNA library files we detected a range of 0-6 reads called by the software as "mus musculus" and that averages to 2% across 21 samples.

    Not sure how to scale this though. There were a total of 24 samples, 21 environmental, 3 mouse RNA. The run demultiplexed to 4 billion environmental clusters and 0.5 billion mouse RNA sample clusters. In the 4 billion environmental reads 0.2% are mouse. So is that 0.2% index hopping rate? Or because there were 1/8th the number of clustered mouse amplicons as environmental amplicons should I multiply that figure by 8?

    To get a mouse read in an environmental sample, it would be necessary for an index to be "donated" from a mouse sample to an environmental amplicon. In the end I only care to use the mouse sequence to identify the percentage of reads mis-assigned overall.

    Okay, generally one is cautioned to move into numbers if percentages are misleading. 0.2% of 4 billion clusters 8x10^6 or 8 million mis-assigned clusters for the run. Those are the events I can detect. How many non-detected events would I project? Yeah, probably 1.6%.

    These were made to run on the HiSeq (and they were).

    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    Originally posted by Brian Bushnell View Post
    This is kind of tangential to NovaSeq, but...

    I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.

    I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.
    Yeah, I'm more of a bench scientist by background. And until I saw nucacidhunter's post above I hadn't seen any plausible mechanism as to how purified Illumina amplicon libraries would "swap indexes" due to sitting around mixed together. That is, under normal conditions DNA is very nearly inert and stable. It doesn't recombine without the help of enzyme(s).

    But I guess previous instantiations of ex-amp (HiSeq 4000/X) require the researcher to mix the "ex amp" reagent with the library pool prior to clustering on the cbot. If this reagent contains the polymerase and other reactants then it could indeed be responsible for the recommendation not to leave pools sitting around at room temp or at all.

    The NovaSeq does only on-board clustering and so adds the ex-amp reagents to the denatured library pool itself. So the "letting libraries sit around as pool prohibition" should not be an issue for it. If this is one of the mechanisms of index-hopping...

    --
    Phillip

    Leave a comment:


  • Brian Bushnell
    replied
    This is kind of tangential to NovaSeq, but...

    I've suggested that we keep everything on ice whenever possible prior to sequencing, due to the fact that low temperatures retard any kind of activity and thus should inhibit adapter-swapping (which is a huge problem as we run a lot of highly-amplified single cells). But my explanations were too vague to be taken seriously, since I don't know the specifics of the reactions. I would love to have a very clear (and preferably lengthy, rather than concise) explanation of exactly why and when keeping pools on ice should prevent crosstalk, that I can copy and paste (attributing credit, if desired) to the people in charge of making libraries.

    I think it is obvious that the longer you let a mixed batch of libraries sit around, and the higher the temperature, the more index-swapping will occur, regardless of the mechanism. But without citing a specific mechanism (and it does not really matter if it is the dominant one), nobody involved with library prep will pay attention to my concerns on the issue (meaning, no tests of ice vs no ice). All I really need is a real mechanism, which seems sufficiently important to cause a test to be run; once that occurs, I'll be satisfied, even if the results are negative and indicate that keeping pooled libraries at a high temperature for a long time seems to be optimal for preventing crosstalk. Not that I'll believe negative results unless I run the experiment myself, but at least I'll believe I did my best. I'll still report the results here.
    Last edited by Brian Bushnell; 07-21-2017, 06:34 AM.

    Leave a comment:


  • nucacidhunter
    replied
    My understanding is that index hopping can happen any time in the pool which contains single stranded library fragments, a partially complementary oligo (from PCR or adapter oligos) that can pair with a strand and ExAmp reagents. Amplification is isothermal and is at optimum in the temperature maintained during clustering but like most polymerase there should be some low level activity in non-optimal temperatures as well. These are the reasons that preparing pool just prior to loading and keeping on ice is highly recommended.

    Leave a comment:


  • pmiguel
    replied
    The recommended method to detect an index swap is to use "Unique Dual Indexes". With these you don't use the same i7 index in multiple pairs. A given i7 index always goes with a fixed i5 index for the run. Then if you detect an i7 index with any i5 index other than its pair, you know an index hop has occurred and the reads are discarded.

    This will remove all index hops the result of a single recombination event. It will also remove nearly all the double recombinations. So true index hops should be largely detectable.

    As to what causes index hopping, I don't think that Illumina is sure. They seem mainly to have a list of "best practices" to use to lower their frequency.

    I haven't looked in detail at the process of exclusion amplification either. But I presume that it involves some non-flowcell-tethered PCR amplification.

    --
    Phillip

    Leave a comment:


  • cement_head
    replied
    Originally posted by nucacidhunter View Post
    Exclusion Amplification (ExAmp) has been explained in the following video.
    https://www.youtube.com/watch?v=pfZp5Vgsbw0

    Following is the link for the patent:
    https://www.google.com.au/patents/WO2013188582A1?cl=en
    The video wasn't overly helpful, but if I understand the patent description, they're saying that they've essentially hyper optimised the bridge amplification such that once a single seed molecule binds within a nanowell, after 14 rounds, it will dominate the signal during SBS? If true, then it must be within the first two rounds that the seed molecules drift from one nanowell to the next (this is the average transport vs average amplification rate that they constantly cite in the patent). Or, does the mispriming occur PRIOR to the initial hybridisation to the nanowell, and before the first round of bridge ampification?

    I now understand the need for (a) super-clean libraries and (b) size optimised libraries - to beat the "average" diffusion rate(s) on these HiSeq3000/4000/X/NovaSeq platforms.

    Here's the real question: how does one detect index swapped (hopped) reads? Do you have to have a reference? It would seem that the answer would be "yes", or as Illumina suggests in their white paper, one has to a priori have an idea of the expression levels/targets?
    Last edited by cement_head; 07-20-2017, 09:55 AM. Reason: clarity

    Leave a comment:


  • nucacidhunter
    replied
    Exclusion Amplification (ExAmp) has been explained in the following video.
    https://www.youtube.com/watch?v=pfZp5Vgsbw0

    Following is the link for the patent:
    https://www.google.com.au/patents/WO2013188582A1?cl=en

    Leave a comment:


  • cement_head
    replied
    Originally posted by GenoMax View Post
    See Illumina's white paper on index hopping here.
    Still don't understand the ExAMP chemistry, and unless I missed something, this white paper doesn't explain it. Is it that it is proprietary and largely unknown? Would you happen to have a link to where it is explained? Thanks.

    Aside: Hard to believe this has been going on this long and Illumina has been largely silent about this - one would think they would have issued a protocol change for ONLY dual-index libraries on nanocell instruments.

    Leave a comment:


  • GenoMax
    replied
    See Illumina's white paper on index hopping here.

    Leave a comment:


  • cement_head
    replied
    Does INDEX swapping (hopping) occur because this (release and re-annealing) is the method for generating clusters within each nanocell, and the swapping is as the result of the DNA fragments (library frags) inadvertently jumping/hopping too far into the next nanocell?

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by pmiguel View Post
    What went into that 8000 PPM (0.8%) calculation Brian? I mean, did you just count the number of swaps in a dual unique indexed run?

    Anyone checked that figure for a HiSeq 2500 run? I know no one is complaining about index hopping on that instrument or a MiSeq, but it would happen at some rate.

    --
    Phillip
    The 8000 PPM was single-indexed. This was not an ideal test, but there were a few E.coli isolate libraries multiplexed with various other things (a lot of Chlamy, and various bacterial single-cells). Also, some were dual indexed and some were single-indexed, in the same run, and for whatever reason demultiplexing was done with only 6bp of the barcode for the single-indexed libraries rather than all 8 (allowing zero mismatches). So I'm not really sure what the rates would be in an ideal test environment. That said, for the reads that came out as this particular E.coli library, I concatenated all references for everything being sequenced together and ran:

    Code:
    seal.sh in=reads.fq stats=stats.txt ambig=toss clearzone=10
    Everything hitting E.coli was considered correct, and everything hitting anything else was considered contamination. For the dual-indexed test I used a P.heparinus single-cell library with similar methodology.

    I also tested a HiSeq run of the same E.coli library and calculated a 7 PPM contamination rate, but that's not really credible since I don't know what else was present on the plate in that run so I don't necessarily have the correct references (though there was definitely some Chlamy present). In the past I've seen various rates of cross contamination in HiSeq 2500 (<1PPM to >1000PPM) and it's actually quite hard to consistently reproduce the same numbers on different runs. The cross contamination comes from various sources, including physical contamination, though I think we've eliminated physical in our cross contamination current processes. NextSeq has generally yielded lower rates of cross contamination compared to HiSeq 2500 so we use that for our multiplexed single cells even though the quality is lower than HiSeq.
    Last edited by Brian Bushnell; 07-14-2017, 09:34 AM.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by cement_head View Post
    What is PPM?
    Parts Per Million.

    --
    Phillip

    Leave a comment:


  • cement_head
    replied
    Originally posted by Brian Bushnell View Post
    I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
    What is PPM?

    Leave a comment:


  • pmiguel
    replied
    Originally posted by GenoMax View Post
    The important point is JGI probably made VERY GOOD quality libraries. With patterned FC's having clean libraries (with just the right sized inserts, zero primers and dimers) are critical to minimizing these issues. Since we are talking about "B"illions of reads losing some during dedupe should not cause a major loss. 2D barcoding seems essential (perhaps should be made mandatory).
    From what I'm hearing, the NovaSeq doesn't have the major issues with amplicon lengths that the HiSeq4000 and X do. The NovaSeq is spec'ed to run 550bp no PCR DNA libraries, unlike the HiSeq patterned flowcell instruments.

    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    Originally posted by Brian Bushnell View Post
    I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
    What went into that 8000 PPM (0.8%) calculation Brian? I mean, did you just count the number of swaps in a dual unique indexed run?

    Anyone checked that figure for a HiSeq 2500 run? I know no one is complaining about index hopping on that instrument or a MiSeq, but it would happen at some rate.

    --
    Phillip

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X