Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ohofmann
    Member
    • Jan 2009
    • 37

    Quality cutoffs

    'lo everyone,



    been working with entirely too many different quality scores in the last few weeks (Sanger/Solexa/Illumina) and trying to get a bit of a handle on best practices. For a normal re-sequencing project there seem to be a fair amount of steps where filtering can occur:

    * on the sequence level (number of Ns in the read, polynucleotide sequences etc)
    * on the sequence quality level (minimum average FASTQ score for a given read)
    * the alignment level (quality of the alignment for a read)
    * and finally on the SNP call level (confidence in the call -- and I've yet to understand the difference between consensus quality, SNP quality and RMS mapping quality in SAMTools)

    Details are of course going to be project dependent, and I can come up with rough filter values by converting the score to a Phred probability and deciding on the false positives I'm willing to accept, but are there rough guidelines for any of these?

    For example, so far we've been using a lower boundary of 15 for the alignment quality (as provided by the SAM/Pileup format), but that is more or less empirical. I haven't been able to find a discussion or review on these topics, but probably just missed them.

    Cheers, Oliver
  • zee
    NGS specialist
    • Apr 2008
    • 249

    #2
    I believe the first three on your list should be dealt with by the aligner. We keep uniquely mapped reads and use other tools to process datasets by using the mapping quality score as one of the important alignment filters.

    I usually keep all single-locus alignments and create subsets filtered using a mapping quality >=10. It's also interesting to generate a distribution of all the mapping qualities to see how much data you're filtering off.

    Many groups have their own preferred rules for SNP filtering. I'm not an expert in SNP detection but I suppose it's best to use a fairly high stringency for filtering these off.

    Comment

    • baohua100
      Senior Member
      • Jun 2008
      • 103

      #3
      When using MAQ, how can i choose mapping quality ?

      I can not know which read is unique mapping

      Comment

      • zee
        NGS specialist
        • Apr 2008
        • 249

        #4
        I believe with MAQ the reads with mapping quality (q) equal to zero are ambiguously mapped and a random one is chosen. So best to filter out anything with a mapping quality greater than zero. I usually choose anything above q=10 or q=20.

        Originally posted by baohua100 View Post
        When using MAQ, how can i choose mapping quality ?

        I can not know which read is unique mapping

        Comment

        • dawe
          Senior Member
          • Apr 2009
          • 258

          #5
          Hi Oliver,
          I'm facing the very same issues...

          Originally posted by ohofmann View Post
          * on the sequence level (number of Ns in the read, polynucleotide sequences etc)
          * on the sequence quality level (minimum average FASTQ score for a given read)
          * the alignment level (quality of the alignment for a read)
          * and finally on the SNP call level (confidence in the call -- and I've yet to understand the difference between consensus quality, SNP quality and RMS mapping quality in SAMTools)
          About the last one... I've worked with maq SNP calls and tried different cutoffs. In a first attempt I've considered only SNP having a coverage at least equal to the expected coverage for our experiment (i.e. it was a yeast genome, we were expecting ~40x coverage in a single lane...). We had a test case with known mutations and we've seen that the coverage rule plus high base confidence let us identify the SNPs.
          In a second experiment we had a much higher coverage (~90x) but we have seen that a ~30x - 40x coverage would be enough to find SNP.
          I'm dealing right now with samtools pileup function and I really can't figure out what's RMS mapping quality. I guess consensus quality is related to overall base call quality for that position, SNP quality is related to the surrounding reads qualities (i.e. if your SNP is in the middle of bad quality reads you may be less confident of your result... Maq, as example, gives you the base confidence for the left and right nucleotide surrounding your SNP).

          Comment

          • baohua100
            Senior Member
            • Jun 2008
            • 103

            #6
            Originally posted by zee View Post
            I believe with MAQ the reads with mapping quality (q) equal to zero are ambiguously mapped and a random one is chosen. So best to filter out anything with a mapping quality greater than zero. I usually choose anything above q=10 or q=20.
            mapping quality score 10, what's the error probability?

            Comment

            • ohofmann
              Member
              • Jan 2009
              • 37

              #7
              Should be 0.1; see http://en.wikipedia.org/wiki/Phred_quality_score

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 10:09 AM
              0 responses
              10 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              26 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Working...