Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nieder
    Junior Member
    • May 2013
    • 3

    low frequency variants vs mapping quality

    I'm just starting looking at MiSeq data from pooled resequencing samples (specific gene of parasitic field isolates) and am looking at the frequency of specific rare variants in the gene. Using "GATK -T DepthOfCoverage", I've gotten base counts at the positions I care about and then calculated frequencies.

    From published testing, it seems that the MiSeq has an error rate of 0.1/100bp, which is right at the frequency I am seeing for my variants (e.g. 3 out of 3155 reads). When generating the reads, if I filter to "minBaseQuality 30", I lose the reads with a variant for many of the samples. Since I'm right at the edge of the error rate, what calculation should I use to determine whether or not to trust the frequencies that I am seeing?
  • me_myself_andI
    Member
    • Nov 2010
    • 30

    #2
    Shameless plug: one option is to simply not filter for mapping quality, and instead to use a low frequency variant caller that builds mapping quality into its model, e.g. LoFreq

    Andreas

    Comment

    • JackieBadger
      Senior Member
      • Mar 2009
      • 385

      #3

      Comment

      • JackieBadger
        Senior Member
        • Mar 2009
        • 385

        #4
        I would be very cautious of calling a variant 0.09% of the total depth of a sample. In my experience PCR-carryover contamination, and repeatable sequence specific base mismatch errors can appear at a higher percentage than this.
        If something is rare in your sample, but real, you should still see it at considerably higher depth than errors. NGS can quantify copy number of alleles but to get 3 out of 3155 reads would leave me to believe that it is not "real".

        Comment

        • nieder
          Junior Member
          • May 2013
          • 3

          #5
          Thank you both for your help. JackieBadger's thought about 3/3155 reads not being 'real' was indeed my worry, since it's such a low frequency. We are looking for any evidence that some rare parasitic gene variants exist in a specific local population of field isolates, and were thrilled to find a few of them in some of the samples. But then when I started doing some basic filtering, they disappeared, and so I was hoping there was some method to provide some level of statistical confirmation.

          Running LoFreq on the BAM files also showed that the variants we had found were not real. Thankfully, this particular case has a happy ending regardless of which way it turned out, since either answer provides solid information towards explaining the possible spread of this variant across the endemic area.

          Comment

          • JackieBadger
            Senior Member
            • Mar 2009
            • 385

            #6
            so parasitic gene variants...would it not be easier to barcode individuals? Is that possible or are you doing some metagenomic/pooling type analysis.
            Just because a variant is rare in a population, it still represents an allelic variant that should be seen at a decent depth.
            Have you confirmed the variant using cloning?

            Comment

            • nieder
              Junior Member
              • May 2013
              • 3

              #7
              Each individual human from which parasite samples were collected is individually barcoded. Unfortunately, the parasitic DNA for each individual is a is a pooled sample of all the parasitic eggs collected from that person (pooling is a necessity from the way eggs are harvested), and the infection is known to not be clonal. One of the things we are trying to find is whether or not the known lab variants are actually found in the wild. We were thrilled to find them in some of our samples, but their low frequency and low quality seems to indicate that they might not be real. We're trying to see if there's a way to take the variant rates in other positions of the gene and draw conclusions about the variant rate at our positions of interest.

              Finally, at this point, we don't have the ability and/or capacity to do any cloning of our samples.

              Comment

              • JackieBadger
                Senior Member
                • Mar 2009
                • 385

                #8
                are you preparing/sequencing your lab samples and wild samples in the same room, with same pipets etc, or same sequencer?
                I would say that you are seeing low level pcr-carryover.

                I target the same amplicon in individuals, and then pool into the same library. Each amplicon may contain say 8 alleles. When we pool 1000 individuals, we still see rare alleles i.e. present in just one individual, at significant depths.

                What makes you think these rare variants are real anyway? Do you sequence them at high numbers in your lab samples? If they then turn up as low copy number in wild I would say its low level contamination. There was a recent thread also discussing the rate of contamination between MiSeq runs, based on carry over within the machine

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:08 AM
                0 responses
                6 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                53 views
                0 reactions
                Last Post SEQadmin2  
                Working...