Announcement

Collapse
No announcement yet.

MiSeq Carry-over contamination between runs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Just wanted to chime in and say that we've seen run-to-run contamination too on our MiSeq.

    After reading this post, I looked into it and found plenty of evidence.

    Sam

    Comment


    • #17
      Hi All

      I just read James' blog post on this:
      http://core-genomics.blogspot.ca/201...read-this.html
      I'm not certain that my original statement of the problem is entirely clear, and I want to make sure people don't get overly concerned based on our data.

      What we're seeing is 0.2% of reads carrying through from run to run. This means that if we saw 100k reads on BRAF V600E last run, we'd see 200 reads on BRAF V600E in the current run.

      My take is that the impact on sensitivity is read# based, not abundance based. This would affect sensitivity where:
      1) The previous run had a very high mutation abundance, and similar total read number, or
      2) The previous run had a moderate mutation abundance, but much higher read coverage.

      In both cases, it should be possible to calculate bleed-through and account for the background in downstream analysis - by examining previous runs, we should be able to predict whether carry-over will be a problem with the current run. To take James' example, if read number on BRAF was identical between runs, previous run had 100% V600E, I'd expect to see 0.2% in the current run, and flag that as a problem.

      In any case, we're planning to start running a lot more barcodes, so that we can rotate them between runs. I'm hoping our carry-over will decay to near zero by the time we re-run a barcode set. That said, I really think we'll need to track this carefully - one could imagine that carry-over decays slowly...
      We won't have results on this for a while, but when we get them, we'll post them.

      Cheers
      harlon
      Last edited by Harlon; 04-17-2013, 09:39 AM. Reason: added weblink to James' blog post

      Comment


      • #18
        Originally posted by Harlon View Post
        Hi All

        I just read James' blog post on this:
        http://core-genomics.blogspot.ca/201...read-this.html
        I'm not certain that my original statement of the problem is entirely clear, and I want to make sure people don't get overly concerned based on our data.

        What we're seeing is 0.2% of reads carrying through from run to run. This means that if we saw 100k reads on BRAF V600E last run, we'd see 200 reads on BRAF V600E in the current run.

        My take is that the impact on sensitivity is read# based, not abundance based. This would affect sensitivity where:
        1) The previous run had a very high mutation abundance, and similar total read number, or
        2) The previous run had a moderate mutation abundance, but much higher read coverage.

        In both cases, it should be possible to calculate bleed-through and account for the background in downstream analysis - by examining previous runs, we should be able to predict whether carry-over will be a problem with the current run. To take James' example, if read number on BRAF was identical between runs, previous run had 100% V600E, I'd expect to see 0.2% in the current run, and flag that as a problem.

        In any case, we're planning to start running a lot more barcodes, so that we can rotate them between runs. I'm hoping our carry-over will decay to near zero by the time we re-run a barcode set. That said, I really think we'll need to track this carefully - one could imagine that carry-over decays slowly...
        We won't have results on this for a while, but when we get them, we'll post them.

        Cheers
        harlon
        Thank you for your response. Our lab has been running known negative and known positive controls, and for each patient sample performing two technical replicates by barcoding twice with two seperate barcode reagents. Results must be concordant between the two technical replicates, as well as 5 SD above our LOD in the negative control (i.e. analytical LOQ).

        -Tom

        Comment


        • #19
          FYI the GATK callers now include a fractional downsampler specifically designed to compensate for contamination (regardless of origin, which can be lab prep, machine etc). It is activated by default for an estimated rate of contamination of 5% (iirc) but the rate can be adjusted if you suspect a larger issue. So for most genotyping purposes at least this doesn't have to be a showstopper.

          Comment


          • #20
            We have started to rotate the use of barcodes. We were already duplicating data sets (although not with sepearate barcodes). ANd we will probably perform a maintaenance wash between runs.
            All of this will mean the problem only exists as an issue for those few groups looking for very low mutation rates in amplified samples (like some at CI). Those grous have lots of other issues to contend with as well so this is just one more to add to the mix.
            I am sure we'll see an improved wash from Illumina soon and that should be the end of it for most users.

            Comment


            • #21
              I'm just wondering what studies/genotyping approaches people are using in the context of this discussion?
              We also see carry over between amplicons....we have mainly attributed this to lab sample set up contamination (as we also saw it in some previous 454 work). There may well be some MiSeq run carry over too...however this level of contamination is easily identified vs true loci due to the large depths we attain... I could see how it would become as issue when using low coverage to genotype, but plenty of studies recommend increasing depth to facilitate genotyping ease. This seems a little more hassle free than investing in more barcodes....

              In addition I guess this problem is more severe when the machine is being run multiple times in one week, as a wash is required ever seven days.

              Comment


              • #22
                And when we start with 2500 I think we'll use cBot clustering for samples where we want to keep comtamination to an absolute minimum.

                The main problem from a cancer perspective is tumour heterogeneity and labs trying to understand if minor clones are important. The earlier we can detect them (by loking at mutatnt allele frequency) the better. And this is likley to need sub 1% detection levels.

                Comment


                • #23
                  Originally posted by james hadfield View Post
                  And when we start with 2500 I think we'll use cBot clustering for samples where we want to keep comtamination to an absolute minimum.

                  The main problem from a cancer perspective is tumour heterogeneity and labs trying to understand if minor clones are important. The earlier we can detect them (by loking at mutatnt allele frequency) the better. And this is likley to need sub 1% detection levels.
                  Rapid chemistry does not allow cBot clustering. Even with the "duo" kit all you get is first strand synthesis. The actual bridge PCR happens on the HiSeq. So I am not sure where that leaves you. Might solve the problem, or not.

                  But, just because this is a problem on the MiSeq does not mean it will be on the HiSeq. Enough of the right type of washes and the issue can be driven to arbitrarily low levels.

                  Has anyone done any back-to-back Rapid HiSeq runs that could be used to test for previous template contamination?
                  --
                  Phillip

                  Comment


                  • #24
                    The duo kit on the HiSeq might be enough to overcome the problem with carry-over since no template ever enters the HiSeq's fluid lines. I wonder if a couple of extra manual washes at just the template position would be enough to help resolve the problem on the MiSeq.

                    Comment


                    • #25
                      We've also seen a carry-over of 0,01% to 0,1% between runs, depending on the washes that were done in between (just post- or also maintenance wash, and how many times).

                      Comment


                      • #26
                        I picked three adjacent runs with highly dissimilar samples:
                        09 Apr 13 with a mix of 3 well-annotated and sequenced bacterial genomes
                        10 Apr 13 with all non-bacteria stuff (broad mix of other stuff)
                        17 Apr 13 with arabidopsis.

                        I just took the "undetermined" bins from the 10Apr and 17Apr runs and mapped to the bacterial genomes (I didn't check to try to sort out if indexes were repeated, etc - that could be an independent check but I'm just scouting at the moment).

                        10Apr shows 0.027% bacterial genome sequences in the Undetermined bin (% mapping divided by total reads from the run) and 17Apr shows 0.024% bacterial genome sequences. Now some of these may be phiX control sequences that happen to map to some of the bacteria.

                        So worst case it looks to me like <<1%. The previous posts report 0.2% using amplicons, so I did another test.

                        Run A was a full MiSeq run of an amplicon library, the following run Run B was again really diverse, lots of different stuff.

                        If I search the undetermined bin of Run B for the first 15 bp of the amplicon, it's present 2497 times out of 12,746,730 reads, so again about 0.02%. Orthogonally, searching for the amplicon's exact barcode gives 925 or about 0.01%.

                        Searching for those 15 bp start seqs of that amplicon in all other samples in that run yields:
                        9 sequences out of 1,200,789 of a small RNA lib set (0.0007%)
                        10 sequences out of 3,193,181 of a bacterial RNA-seq set (0.0003%)

                        Something we can keep an eye on... hopefully no one every makes a decision based on a small number of reads! I've always taught that's a bad idea.

                        For the record - we do 0.5% tween maint. washes between all runs.

                        Comment


                        • #27
                          It seems to me there are two predominant sources of error.

                          1) Library preparation (barcode reagents are cross-contaminated or PCR jumping, etc).

                          2) A platform specific source of cross-contamination from previous sequencing runs on the Miseq?

                          The first source I am quite familiar with, but the posts on this thread are indicating that somehow the previous library library templates are making their way into subsequent sequencing reactions (i.e. the second source)? How? Other than the library preparation being contaminated by templates from a previous library preparation, or non-aerosol resistant tips being used, or someone is not swapping tips during serial dilutions? I'm a bit confused on this.

                          -Tom

                          Comment


                          • #28
                            Originally posted by thomasblomquist View Post
                            It seems to me there are two predominant sources of error.

                            1) Library preparation (barcode reagents are cross-contaminated or PCR jumping, etc).

                            2) A platform specific source of cross-contamination from previous sequencing runs on the Miseq?

                            The first source I am quite familiar with, but the posts on this thread are indicating that somehow the previous library library templates are making their way into subsequent sequencing reactions (i.e. the second source)? How? Other than the library preparation being contaminated by templates from a previous library preparation, or non-aerosol resistant tips being used, or someone is not swapping tips during serial dilutions? I'm a bit confused on this.

                            -Tom
                            Hi Tom

                            We can say with certainty that we're getting cross-contamination from previous sequencing runs.

                            Barcode contamination would give reads on the sequences represented within a run, but having unused barcodes (we see that, but at low representation). Cross contamination from previous runs gives reads that should not be represented in the current run, but identical to reads in the previous run (right down to the barcodes used in that run).

                            As far as I can tell (and validated by discussions with Illumina), this issue is caused by incomplete washing of the MiSeq plumbing between runs - small amounts of DNA probably adsorb to the tubing etc., and then are released while setting up the next run, where they hybridize to the flow cell and are sequenced.

                            Cheers
                            h

                            Comment


                            • #29
                              Originally posted by Harlon View Post
                              Hi Tom

                              We can say with certainty that we're getting cross-contamination from previous sequencing runs.

                              Barcode contamination would give reads on the sequences represented within a run, but having unused barcodes (we see that, but at low representation). Cross contamination from previous runs gives reads that should not be represented in the current run, but identical to reads in the previous run (right down to the barcodes used in that run).

                              As far as I can tell (and validated by discussions with Illumina), this issue is caused by incomplete washing of the MiSeq plumbing between runs - small amounts of DNA probably adsorb to the tubing etc., and then are released while setting up the next run, where they hybridize to the flow cell and are sequenced.

                              Cheers
                              h
                              Hmmm... Why would there be a complete "circuit" where reagents that go into the flow cell are recycled back later? Seems disasterous. Disclaimer: I only prep the libraries in house and send off to genomic cores for sequencing.

                              -Tom

                              Comment


                              • #30
                                Though this may not be practical for all labs (who probably have access to a single MiSeq) as James pointed out in post #20 creative use of rotating barcodes should minimize this problem to a large extent (since the contaminants would end up in "undetermined" pile and be eliminated automatically).

                                It appears that we have a consensus that this problem exists. It is not going to be eliminated without specific protocol changes (if feasible) to ensure a crossover proof washing procedure or a hardware redesign (of parts of) MiSeq. I hope a path will exist to retro-fit the solution to existing machines if it has to be the latter.

                                There may be some applications (a small subset) where this contamination may be unacceptable and for those MiSeq may have to be passed over. But for all remaining applications this may become just another item in technology limitations/footnotes.

                                Comment

                                Working...
                                X