Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA samflag inconsistencies

    Dear bioinformatics community,

    I have used BWA to align paired-end reads to a mammalian genome. I extracted orphan reads from the alignment with this command:

    awk 'and ($2, 0x0004) && ! and ($2, 0x0008) && $3!~/Un/' alignment.sam > orphans.sam

    I have noticed some apparent errors in the output, for example:

    HWUSI-EAS091C_0049:8:10:13851:9893#0 167 27 39960072 29 11S64M 28 167 0 TGTGAATGTAGTGCCCATGAAGGTGGATGCAGAGATCCGGCAGTCGTTGCAGTGCACCCTGCTGTGTCCCCGTAT IIHIIIIIIIIGIIIIIIHIIIIFIIHIIIHHIGIIIIIFIIHHHIIIFHHIGIIIHHIGHGGEEHGGIHGGDEE XT:A:M NM:i:0 SM:i:29 AM:i:29 XM:i:0 XO:i:0 XG:i:0 MD:Z:64

    Using this tool to interpret the samflag http://picard.sourceforge.net/explain-flags.html, it seems that this read is:

    Paired
    Proper pair
    Read unmapped
    Mate reverse strand
    Second in pair

    -how can the read be in a proper pair, yet unmapped?
    -how can the read have a mapping location and a non-zero mapping quality if it is unmapped?

    Another example:

    HWUSI-EAS091C_0049:8:17:3987:3357#0 117 6 84719005 37 9M1D66M Un0684.1-12705 6795 0 ATGATGATGTGGTGATGATGATGATGATGATGATTATGATGATGATGATGATGATGATGGTGATGATGATGATGG IIDIIIGIIG<IHIHHHIEIIGIHGHIIIHDIHFIIIIIIGIIHIIHHIIDIIIIGIIIIIIGIHIIIIIIIIII XT:A:U NM:i:4 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:1 MD:Z:9^A25G6G32A0

    Paired
    Unmapped
    Read reverse
    Mate reverse
    First in pair

    -how can it be ‘mapped to reverse strand’ yet unmapped?
    -how can it have a mapping location and a non-zero mapping quality if it unmapped?


    On a previous occasion with the same data, I encountered errors when adding RG headers with Picard:

    Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 2096393, Read name HWUSI-EAS091C_0049:1:29:16923:17016#0, MAPQ should be 0 for unmapped read.

    There were countless of these errors, and I was forced to add the flag "VALIDATION_STRINGENCY=LENIENT" to the command line (a helpful tip from this site) to complete task.

    Why is BWA giving unmapped reads non-zero mapping qualities?

    Any help here would be much appreciated, because other than this I am rather fond of the speed and accuracy of BWA compared to other aligners so it would be nice to be able to correct this issue.

    Thanks in advance,

    Tally

  • #2
    Originally posted by Tally View Post
    Using this tool to interpret the samflag http://picard.sourceforge.net/explain-flags.html, it seems that this read is:

    Paired
    Proper pair
    Read unmapped
    Mate reverse strand
    Second in pair

    -how can the read be in a proper pair, yet unmapped?
    -how can the read have a mapping location and a non-zero mapping quality if it is unmapped?
    Technically the specification does say "If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template." i.e. If the read is unmapped, those other flags should be ignored.

    Thus BWA is within the specification, but nevertheless the output is confusing and arguably a bug.

    Originally posted by Tally View Post
    On a previous occasion with the same data, I encountered errors when adding RG headers with Picard:

    Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 2096393, Read name HWUSI-EAS091C_0049:1:29:16923:17016#0, MAPQ should be 0 for unmapped read.
    Again same bit of the specification, "If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template." i.e. If the read is unmapped, then MAPQ should be ignored.

    Again, BWA is within the specification, but it would be cleaner to use MAPQ of zero which is what Picard checks for with its default stringency.

    So again, BWA is within the specification, but nevertheless the output is confusing and arguably a bug.
    Last edited by maubp; 11-25-2011, 03:35 AM. Reason: Added links to the spec

    Comment


    • #3
      On further reflection, you could argue Picard's http://picard.sourceforge.net/explain-flags.html is broken as it should ignore these bits when 0x4 is set. Perhaps it would be more helpful to put "(irrelevant)" or something next to them as that would be more useful.

      Comment


      • #4
        To clairfy a bit, bwa concatantes reference sequences, so if a read hangs off of one sequence onto another, it will be both given the right mapping coordinates, and give then 4 flag.

        Second, sam specs call for the unmapped mate of a mapped read to be given the mapping position of the mapped mate. This is so that the unmapped read will sort next to its mapped mate. But that's another source of reads that have the 4 flagged, and have mapping coordinates.

        So this is why you have reads like that.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          Second, sam specs call for the unmapped mate of a mapped read to be given the mapping position of the mapped mate. This is so that the unmapped read will sort next to its mapped mate. But that's another source of reads that have the 4 flagged, and have mapping coordinates
          Thanks - I missed that question in the original post.

          Comment


          • #6
            Thanks for the responses, that is very helpful. That certainly makes sense to give unmapped mates the mapped mate coordinates for sorting purposes, however it is confusing that BWA sets other fields when the read is unmapped. Thank you for explaining to ignore those fields, as I missed that in the SAM specification.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Working...
            X