Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA Soft Clipping

    Hi,

    When I run BWA without specifying a "q" value (which defaults to 0 as I understand it from the manual), I would not expect any trimming to occur.

    However, the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings?

    Thanks!

  • #2
    Which value have you specified? Why would you expect trimming not to occur?
    Also, if you specify a q value, you should see information about trimming while bwa is running.

    d

    Comment


    • #3
      Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

      The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

      Thanks.

      Comment


      • #4
        Originally posted by Bio.X2Y View Post
        Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

        The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

        Thanks.
        Whoops! Sorry for misreading your post.
        Can you post a soft-clipped entry? Could it be some effect of SW alignment instead?

        d

        Comment


        • #5
          Hi,
          Below is an example (both ends shown).

          I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.

          Thanks for your help!

          SRR018256.13099683 83 RN28S1|NR_003287.2 4925 29 51M 4550 -426 CCCCCCGTCACGCACCGCACGTTCGTGGGGAACCTGGCGCTAAACCATTCG #%#&&$($($&'%$,#&+%+'+&)((0,**.0++,+1)65.7C+II<@II. XT:A:U NM:i:2 SM:i:29 AM:i:29X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0T1G48
          SRR018256.13099683 163 RN28S1|NR_003287.2 4550 29 45M6S 4925 426 GTTAGTTTTACCCTACTGATGATGTGTTGTTGCCATAGTAATCCTNTNTAG I+I;-77I=,10>9/55I)*;%1+%*++%0+))&$%#'$&"'%))!#!$"% XT:A:M NM:i:1 SM:i:29 AM:i:29XM:i:1 XO:i:0 XG:i:0 MD:Z:36G8

          Comment


          • #6
            Originally posted by Bio.X2Y View Post
            Hi,
            Below is an example (both ends shown).

            I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.
            I don't mean that's an artifact. bwa extends your match by smith-waterman alignment. I guess the terminal part of a read may be soft-clipped if this implies a higher score.
            Trimming is quite different, as it is performed at alignment time evaluating the read qualities.

            d

            Comment


            • #7
              How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
              A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
              Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

              Comment


              • #8
                bwa may do smith-waterman alignment, which produces soft clipping.

                Comment


                • #9
                  What about the quality trimming? Does it actually happen, or it produces soft-clippings too? Thanks!

                  Comment


                  • #10
                    Originally posted by pparg View Post
                    How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
                    A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
                    Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

                    This may be a late answer.
                    To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

                    -q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

                    While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

                    So basically you are talking about two different things.

                    Comment


                    • #11
                      Originally posted by CNVboy View Post
                      This may be a late answer.
                      To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

                      -q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

                      While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

                      So basically you are talking about two different things.
                      Hi, I think that we know that the trimming and soft-clipping are made for different purposes, but in the SAM file, the cigar string shows the clipping information: e.g. 4S26M but not the reason why its clipped.

                      The problem here is: why does bwa clipped/trimmed reads when -q option is not specified? is soft-clipping its part of bwa's nature?

                      I have also noticed that lots alignment tools do the soft-clipping, even it is not an option stated in the manual or parameters. On one side, soft-clipping would generate more alignments, or maybe 'higher' alignment rate, but what about if we want the alignment results with exactly 1 mismatch?

                      I think the soft-clipping is a bit collision to the mismatch option. For "4S26M", would the '4' also count as mismatch allowed = 4?

                      Comment


                      • #12
                        I don't know if this is that case for that specific read, since you didn't post the whole line, but the sam specification requires clipping if a read goes of the end of a reference sequence.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Best Practices for Single-Cell Sequencing Analysis
                          by seqadmin



                          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                          06-06-2024, 07:15 AM
                        • seqadmin
                          Latest Developments in Precision Medicine
                          by seqadmin



                          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                          Somatic Genomics
                          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                          05-24-2024, 01:16 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 07:24 AM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 08:58 AM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-12-2024, 02:20 PM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-07-2024, 06:58 AM
                        0 responses
                        184 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X