Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quality cutoffs

    'lo everyone,



    been working with entirely too many different quality scores in the last few weeks (Sanger/Solexa/Illumina) and trying to get a bit of a handle on best practices. For a normal re-sequencing project there seem to be a fair amount of steps where filtering can occur:

    * on the sequence level (number of Ns in the read, polynucleotide sequences etc)
    * on the sequence quality level (minimum average FASTQ score for a given read)
    * the alignment level (quality of the alignment for a read)
    * and finally on the SNP call level (confidence in the call -- and I've yet to understand the difference between consensus quality, SNP quality and RMS mapping quality in SAMTools)

    Details are of course going to be project dependent, and I can come up with rough filter values by converting the score to a Phred probability and deciding on the false positives I'm willing to accept, but are there rough guidelines for any of these?

    For example, so far we've been using a lower boundary of 15 for the alignment quality (as provided by the SAM/Pileup format), but that is more or less empirical. I haven't been able to find a discussion or review on these topics, but probably just missed them.

    Cheers, Oliver

  • #2
    I believe the first three on your list should be dealt with by the aligner. We keep uniquely mapped reads and use other tools to process datasets by using the mapping quality score as one of the important alignment filters.

    I usually keep all single-locus alignments and create subsets filtered using a mapping quality >=10. It's also interesting to generate a distribution of all the mapping qualities to see how much data you're filtering off.

    Many groups have their own preferred rules for SNP filtering. I'm not an expert in SNP detection but I suppose it's best to use a fairly high stringency for filtering these off.

    Comment


    • #3
      When using MAQ, how can i choose mapping quality ?

      I can not know which read is unique mapping

      Comment


      • #4
        I believe with MAQ the reads with mapping quality (q) equal to zero are ambiguously mapped and a random one is chosen. So best to filter out anything with a mapping quality greater than zero. I usually choose anything above q=10 or q=20.

        Originally posted by baohua100 View Post
        When using MAQ, how can i choose mapping quality ?

        I can not know which read is unique mapping

        Comment


        • #5
          Hi Oliver,
          I'm facing the very same issues...

          Originally posted by ohofmann View Post
          * on the sequence level (number of Ns in the read, polynucleotide sequences etc)
          * on the sequence quality level (minimum average FASTQ score for a given read)
          * the alignment level (quality of the alignment for a read)
          * and finally on the SNP call level (confidence in the call -- and I've yet to understand the difference between consensus quality, SNP quality and RMS mapping quality in SAMTools)
          About the last one... I've worked with maq SNP calls and tried different cutoffs. In a first attempt I've considered only SNP having a coverage at least equal to the expected coverage for our experiment (i.e. it was a yeast genome, we were expecting ~40x coverage in a single lane...). We had a test case with known mutations and we've seen that the coverage rule plus high base confidence let us identify the SNPs.
          In a second experiment we had a much higher coverage (~90x) but we have seen that a ~30x - 40x coverage would be enough to find SNP.
          I'm dealing right now with samtools pileup function and I really can't figure out what's RMS mapping quality. I guess consensus quality is related to overall base call quality for that position, SNP quality is related to the surrounding reads qualities (i.e. if your SNP is in the middle of bad quality reads you may be less confident of your result... Maq, as example, gives you the base confidence for the left and right nucleotide surrounding your SNP).

          Comment


          • #6
            Originally posted by zee View Post
            I believe with MAQ the reads with mapping quality (q) equal to zero are ambiguously mapped and a random one is chosen. So best to filter out anything with a mapping quality greater than zero. I usually choose anything above q=10 or q=20.
            mapping quality score 10, what's the error probability?

            Comment


            • #7
              Should be 0.1; see http://en.wikipedia.org/wiki/Phred_quality_score

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:03 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Working...
              X