Header Leaderboard Ad

Collapse

can somene explain how BWA do its trimming

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • can somene explain how BWA do its trimming

    In BWA manual, there is a -q option.

    Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length.


    I am not really sure what this means. If I say -q 15, what does it really mean?

  • #2
    Originally posted by foxyg View Post
    In BWA manual, there is a -q option.

    Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length.


    I am not really sure what this means. If I say -q 15, what does it really mean?
    I believe it tries to find local maxima in the (INT-q_i) function, and chooses the rightmost maximum. That means that it trims when the quality starts to decrease monotonically below your threshold. This is pretty smart: suppose you have two/three bad qualities at the beginning of your read (say from 5 to 8 bp): hard trimming below a certain threshold results in a 5 bp long read. bwa method actually checks if you have better qualities after that and trims later.

    d

    Comment


    • #3
      So does BWA start scanning from left or right?

      Also what is the common trim parameter here I should use if I have offset 64 FASTQ data?

      Comment


      • #4
        bwa learned from phred.

        @foxyg

        Use Sanger fastq. That is the standard. Use -q15 or -q20. Usually the threshold does not matter too much.

        Comment


        • #5
          Hi again!

          Originally posted by foxyg View Post
          So does BWA start scanning from left or right?
          Well... How else? :-)

          Originally posted by foxyg View Post
          Also what is the common trim parameter here I should use if I have offset 64 FASTQ data?
          As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
          BTW, if you have your fastq in Illumina (Pipieline 1.3+) you may try this patch I've written. It enables a '-I' option to bwa aln so that you can use Illumina reads and trim (and output) as they were in Sanger scale.

          d

          Comment


          • #6
            Originally posted by lh3 View Post
            bwa learned from phred.
            Use Sanger fastq. That is the standard. Use -q15 or -q20. Usually the threshold does not matter too much.
            Hi,
            I wonder when does the threshold matter too much? And could anybody explain why it usually doesn't matter?
            Thanks!

            Comment


            • #7
              Originally posted by dawe View Post
              As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
              lh3 also pointed out that
              Originally posted by lh3 View Post
              Usually the threshold does not matter too much.
              So on the same tone of ElMichael, why would the threshold not matter? I mean, if the quality threshold is higher, you would select fewer bases, as the overall quality would decrease, right?

              Dawe, what did you base your choice of read trimming threshold (15-20) upon? Is there a specific paper saying "this is a commonly used threshold value", like the use of p-value=0.05 for hypothesis testing? I just want to have some confirmation of the threshold selection.
              "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

              Comment


              • #8
                I suppose if the probability of a base call being wrong is less than .01, you'd still want to keep it.

                Comment

                Working...
                X