Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HiSeq mystery: bad quality scores affecting mainly T and G in first four cycles

    My lab is having quality problems in the first few cycles of our runs, affecting mainly Gs and Ts.

    Our sequencing is performed at a core facility (we are the client) on a HiSeq2000. These are 50 SE runs. We do the library prep ourselves. We multiplex 16 samples per lane using our own barcoded adapters. The barcodes are four bases long, followed by a T. We are careful to balance all four bases at each of the first four positions. The fifth position is always a T due to the T/A ligation used to ligate the adapters. We are sequencing yeast genomic DNA and our insert sizes are in the range of 300 bp.

    Since the beginning of 2012, in some runs we have a surprisingly low number of bases passing a quality score of 30 (see attachment...I wanted to paste it into my post but am not sure how). Other runs have high scores across the board. In discussing this with the core, it appears the "good" runs were performed at much lower cluster density (around 100 million clusters) whereas the "bad" runs were somewhere in the range of 190 million clusters. The tables below include only reads passing the Illumina filter.

    As you can see, Gs and Ts are much more dramatically affected than Cs and As. Also, in some runs mainly first two cycles are affected, whereas in some runs it's the third and/or fourth cycle.

    This is a serious problem for us because it affects our ability to de-multiplex the data using the barcode sequences. In the "bad" runs, we get millions of reads where the first four bases don't correspond to any of our expected barcodes. This reduces our coverage, but is actually not a huge problem because we can easily discard those reads. What's worse is that even though each barcode differs from every other barcode in at least two positions, it's clear after full analysis of the data that some reads are getting placed into the wrong file when we de-multiplex. In other words, the base-calling is so bad that we can't confidently assign reads to the individual samples based on the barcodes. The files are cross-contaminated and we get uninterpretable results.

    Has anyone seen this problem before...or do you have any guesses about what could be happening? Lowering the cluster density appears to alleviate the problem, but we're confused about the root cause. In the past we were able to get high quality data even at higher cluster density and with shorter barcodes (two bases). Any ideas about what could be happening (either on our end or in the sequencing core) would be welcome.

    I should also mention that we have ruled out adapter-dimer as a cause of the problem. We have quantified the number of adapter-dimer reads in a large number of experiments and have found no correlation between amount of adapter contamination and base quality problems. In fact, some of our highest quality data came from samples with the worst adapter-dimer contamination, and vice-versa.
    Attached Files

  • #2
    A quick follow-up -- in our latest run, we got this problem even at low cluster density. So the cluster density per se doesn't seem to be the problem.

    Has anyone else seen this? (149 people have viewed this thread...but no responses yet!)

    Comment


    • #3
      What exactly are your high and low cluster densities, in cpmm^2, not total clusters. And are the total cluster numbers you give pre- or post-filtering?

      Comment


      • #4
        Thanks for writing, GW. My cluster densities are around 694,000 cpmm^2 (high density runs) or 365,000 cpmm^2 (lower density runs). I confirmed with the core facility -- these are the raw cluster densities, not post-filtering.
        Last edited by sfbiologist; 09-07-2012, 03:21 PM.

        Comment


        • #5
          These are the barcodes we're using for these experiments. I'm wondering if the fact that the first two bases are the same in 4 out of 16 barcodes could be a problem. Would this be considered a "low-diversity" library? We always use all 16 together in equal proportions.

          TGCA
          TGAC
          TGTG
          TGGT
          GTCA
          GTAC
          GTTG
          GTGT
          CACA
          CAAC
          CATG
          CAGT
          ACCA
          ACAC
          ACTG
          ACGT

          Comment


          • #6
            Those are decent cluster numbers. In seeing your tags, are you sure they're evenly loaded? You don't accidentally have a lot of a one or a few of the tags, instead of everything evenly balanced?

            Nothing is really standing out for me other than the 5th position T.

            Comment


            • #7
              I think the tags are pretty evenly loaded...when I look at the number of A,C,G, and T calls for the first four cycles, the numbers are pretty close to 25% for each. But I don't know enough about the sequencing run itself to understand whether the tech is adjusting parameters during the run to force those numbers to approach 25%. But also, I think if we're loading unevenly, the exact nature of the unevenness should be different from experiment to experiment...but instead the problem appears more or less the same in a dozen or more experiments.

              In the 5th cycle, we get at least 95% Ts, as expected. In cycles 6 and later (which is yeast genomic DNA), we get about 18-19% Gs and Cs, and about 30% As and Ts. This is the opposite of what I was expecting -- Saccharomyces supposedly has higher GC content than AT. But maybe something about the prep is favoring AT rich DNA?

              A question I have is whether the cluster identification is occurring as early as the 2nd cycle. Is it possible that, for example, if two adjacent clusters have different tags that start with the same two-letter sequence (say, TGCA and TGAC), they would initially be identified as a single cluster after cycle 2, but then in cycle 3 would appear mixed...leading to a low-confidence call and/or wrong call? (Again I don't know enough about what's happening during cluster identification and base-calling to understand if this is plausible. If someone more knowledgeable could fill me in, that would be great!)

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X