Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • low frequency variants vs mapping quality

    I'm just starting looking at MiSeq data from pooled resequencing samples (specific gene of parasitic field isolates) and am looking at the frequency of specific rare variants in the gene. Using "GATK -T DepthOfCoverage", I've gotten base counts at the positions I care about and then calculated frequencies.

    From published testing, it seems that the MiSeq has an error rate of 0.1/100bp, which is right at the frequency I am seeing for my variants (e.g. 3 out of 3155 reads). When generating the reads, if I filter to "minBaseQuality 30", I lose the reads with a variant for many of the samples. Since I'm right at the edge of the error rate, what calculation should I use to determine whether or not to trust the frequencies that I am seeing?

  • #2
    Shameless plug: one option is to simply not filter for mapping quality, and instead to use a low frequency variant caller that builds mapping quality into its model, e.g. LoFreq

    Andreas

    Comment


    • #3

      Comment


      • #4
        I would be very cautious of calling a variant 0.09% of the total depth of a sample. In my experience PCR-carryover contamination, and repeatable sequence specific base mismatch errors can appear at a higher percentage than this.
        If something is rare in your sample, but real, you should still see it at considerably higher depth than errors. NGS can quantify copy number of alleles but to get 3 out of 3155 reads would leave me to believe that it is not "real".

        Comment


        • #5
          Thank you both for your help. JackieBadger's thought about 3/3155 reads not being 'real' was indeed my worry, since it's such a low frequency. We are looking for any evidence that some rare parasitic gene variants exist in a specific local population of field isolates, and were thrilled to find a few of them in some of the samples. But then when I started doing some basic filtering, they disappeared, and so I was hoping there was some method to provide some level of statistical confirmation.

          Running LoFreq on the BAM files also showed that the variants we had found were not real. Thankfully, this particular case has a happy ending regardless of which way it turned out, since either answer provides solid information towards explaining the possible spread of this variant across the endemic area.

          Comment


          • #6
            so parasitic gene variants...would it not be easier to barcode individuals? Is that possible or are you doing some metagenomic/pooling type analysis.
            Just because a variant is rare in a population, it still represents an allelic variant that should be seen at a decent depth.
            Have you confirmed the variant using cloning?

            Comment


            • #7
              Each individual human from which parasite samples were collected is individually barcoded. Unfortunately, the parasitic DNA for each individual is a is a pooled sample of all the parasitic eggs collected from that person (pooling is a necessity from the way eggs are harvested), and the infection is known to not be clonal. One of the things we are trying to find is whether or not the known lab variants are actually found in the wild. We were thrilled to find them in some of our samples, but their low frequency and low quality seems to indicate that they might not be real. We're trying to see if there's a way to take the variant rates in other positions of the gene and draw conclusions about the variant rate at our positions of interest.

              Finally, at this point, we don't have the ability and/or capacity to do any cloning of our samples.

              Comment


              • #8
                are you preparing/sequencing your lab samples and wild samples in the same room, with same pipets etc, or same sequencer?
                I would say that you are seeing low level pcr-carryover.

                I target the same amplicon in individuals, and then pool into the same library. Each amplicon may contain say 8 alleles. When we pool 1000 individuals, we still see rare alleles i.e. present in just one individual, at significant depths.

                What makes you think these rare variants are real anyway? Do you sequence them at high numbers in your lab samples? If they then turn up as low copy number in wild I would say its low level contamination. There was a recent thread also discussing the rate of contamination between MiSeq runs, based on carry over within the machine

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:57 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Working...
                X