Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • No soft-clipping in BWA 0.7.10 anymore?

    Hi all!

    After moving from bwa version 0.5.8c to 0.7.10 I discovered differences that look like errors in the newer version: it seems as BWA doesn't perform soft-clipping anymore, resulting in false positive variant calls:



    Is that true? I could find parameters to disable soft-clipping, but no parameter to explicitly turn on soft-clipping.

    Both version were used with default parameters (both, "bwa aln" and "bwa sampe").


    Any help would be appreciated!

    Thank you in advance,


    Sebastian

  • #2
    Please don't cross-post on here, biostars and the BWA mailing list.

    Comment


    • #3
      Given you have already cross posted, please at take the time to add the URLs here (and there) for cross referencing. People here don't like to waste their volunteered time repeating an answer you've already heard on another forum/platform.

      BioStars duplicate: https://www.biostars.org/p/129443/

      Mailing list duplicate: http://sourceforge.net/p/bio-bwa/mai...sage/33329232/ where Heng Li replied that it was a bug fixed in 0.7.12
      Last edited by maubp; 02-04-2015, 08:40 AM. Reason: Adding links

      Comment


      • #4
        use bwa-mem

        Comment


        • #5
          First, sorry for cross-posting!

          Second, I use reference genome GRCh37, so this bug should not make any difference for me. I think this is another problem.

          Third, bwa mem didn't work at all for our data (2x100bp), it resulted in obviously wrong mapping of reads so I returned to bwa aln and bwa sampe....


          To be more clear: I do have soft-clipped alignments im my SAM file, but I would expect the reads in the screenshot to be soft-clipped, too. Or am I wrong??

          Comment


          • #6
            Check your preferences for "alignments" in IGV (if the screenshot is from IGV): http://www.broadinstitute.org/igv/Pr...ces#Alignments

            You have likely selected "show" soft-clipped bases.
            Last edited by GenoMax; 02-04-2015, 09:20 AM.

            Comment


            • #7
              No, soft-clipped alignments are not shown in IGV. Additionally, here is an entry of the SAM file:


              HISEQ:136:C5L2YANXX:3:1104:20703:26673 81 chr7 100682889 25 100M = 100682893 -96 GGGAACCTACAACTGCTGAAGGTACCAGCATGCGAATCTCAACTCCTAGTGATGGAAGTACTCCATTAACAAGTATACTTGTCAGCACCCTGCCAGTGGC FC0F>GGGGGGEF@BFGGGGGGGGGGGGGDGGFFGFCGEFF>E: DGGGGGFCGGEGGEF<F=GGFGGGGGGGFGGGGGGGGGGGGGGGGFCEGGFBBBBB X0:i:1 X1:i:0 MD:Z:0C0T0T0C0T95 XG:i:0 AM:i:25 NM:i:5 SM:i:25 XM:i:5 XO:i:0 XT:A:U


              The CIGAR string tells me that there was no soft-clipping, although 100M isn't correct either (?). Is it possible that its due to the insert size (-96)? At least this is were the mismatched bases are coming from, as the DNA fragment was shorter than the read length is (2x101bp)...

              Comment


              • #8
                The sequence is 100 bases long, so 100M is correct, though I'm not sure how a tlen of 96 would be possible given that. This does seem a bit like a bug. If you can whittle this down to be just a hand full of reads and that's sufficient to reproduce things, then consider filing a bug report on github.

                Comment


                • #9
                  Originally posted by svos View Post
                  To be more clear: I do have soft-clipped alignments im my SAM file, but I would expect the reads in the screenshot to be soft-clipped, too. Or am I wrong??
                  I guess 4 mismatches at the end of the read is less penalty than clipping of the 4. What are the qualities of the 4 bases?

                  Comment


                  • #10
                    Originally posted by svos View Post
                    The CIGAR string tells me that there was no soft-clipping, although 100M isn't correct either (?). Is it possible that its due to the insert size (-96)? At least this is were the mismatched bases are coming from, as the DNA fragment was shorter than the read length is (2x101bp)...
                    M stands for match or mismatch in the CIGAR

                    Comment


                    • #11
                      Originally posted by Zaag View Post
                      M stands for match or mismatch in the CIGAR

                      Yes, you are right! M means alignment match, not sequence match... Quality values are good (>30).


                      Just to understand soft-clipping correctly: Every base at (both) ends of a read that does not match to the reference sequence anymore should be soft-clipped, right??

                      Comment


                      • #12
                        Not every base.

                        BWA gives every possible aligmments a score and I can imagine that having 4 high quality mismatches at the end of the read yields a higher score then clipping of the 4 bases;

                        if the quality is below 10 (or there are a lot of bases) it would really surpirse me if they don't get clipped.

                        Comment


                        • #13
                          Thank you Zaag for explaining this. I will pay attention to these bases and filter them out by manual alignment inspection!

                          As some default parameter values changed during the versions, this might be due to that...

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          48 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X