Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA samflag inconsistencies

    Dear bioinformatics community,

    I have used BWA to align paired-end reads to a mammalian genome. I extracted orphan reads from the alignment with this command:

    awk 'and ($2, 0x0004) && ! and ($2, 0x0008) && $3!~/Un/' alignment.sam > orphans.sam

    I have noticed some apparent errors in the output, for example:

    HWUSI-EAS091C_0049:8:10:13851:9893#0 167 27 39960072 29 11S64M 28 167 0 TGTGAATGTAGTGCCCATGAAGGTGGATGCAGAGATCCGGCAGTCGTTGCAGTGCACCCTGCTGTGTCCCCGTAT IIHIIIIIIIIGIIIIIIHIIIIFIIHIIIHHIGIIIIIFIIHHHIIIFHHIGIIIHHIGHGGEEHGGIHGGDEE XT:A:M NM:i:0 SM:i:29 AM:i:29 XM:i:0 XO:i:0 XG:i:0 MD:Z:64

    Using this tool to interpret the samflag http://picard.sourceforge.net/explain-flags.html, it seems that this read is:

    Paired
    Proper pair
    Read unmapped
    Mate reverse strand
    Second in pair

    -how can the read be in a proper pair, yet unmapped?
    -how can the read have a mapping location and a non-zero mapping quality if it is unmapped?

    Another example:

    HWUSI-EAS091C_0049:8:17:3987:3357#0 117 6 84719005 37 9M1D66M Un0684.1-12705 6795 0 ATGATGATGTGGTGATGATGATGATGATGATGATTATGATGATGATGATGATGATGATGGTGATGATGATGATGG IIDIIIGIIG<IHIHHHIEIIGIHGHIIIHDIHFIIIIIIGIIHIIHHIIDIIIIGIIIIIIGIHIIIIIIIIII XT:A:U NM:i:4 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:1 MD:Z:9^A25G6G32A0

    Paired
    Unmapped
    Read reverse
    Mate reverse
    First in pair

    -how can it be ‘mapped to reverse strand’ yet unmapped?
    -how can it have a mapping location and a non-zero mapping quality if it unmapped?


    On a previous occasion with the same data, I encountered errors when adding RG headers with Picard:

    Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 2096393, Read name HWUSI-EAS091C_0049:1:29:16923:17016#0, MAPQ should be 0 for unmapped read.

    There were countless of these errors, and I was forced to add the flag "VALIDATION_STRINGENCY=LENIENT" to the command line (a helpful tip from this site) to complete task.

    Why is BWA giving unmapped reads non-zero mapping qualities?

    Any help here would be much appreciated, because other than this I am rather fond of the speed and accuracy of BWA compared to other aligners so it would be nice to be able to correct this issue.

    Thanks in advance,

    Tally

  • #2
    Originally posted by Tally View Post
    Using this tool to interpret the samflag http://picard.sourceforge.net/explain-flags.html, it seems that this read is:

    Paired
    Proper pair
    Read unmapped
    Mate reverse strand
    Second in pair

    -how can the read be in a proper pair, yet unmapped?
    -how can the read have a mapping location and a non-zero mapping quality if it is unmapped?
    Technically the specification does say "If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template." i.e. If the read is unmapped, those other flags should be ignored.

    Thus BWA is within the specification, but nevertheless the output is confusing and arguably a bug.

    Originally posted by Tally View Post
    On a previous occasion with the same data, I encountered errors when adding RG headers with Picard:

    Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 2096393, Read name HWUSI-EAS091C_0049:1:29:16923:17016#0, MAPQ should be 0 for unmapped read.
    Again same bit of the specification, "If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template." i.e. If the read is unmapped, then MAPQ should be ignored.

    Again, BWA is within the specification, but it would be cleaner to use MAPQ of zero which is what Picard checks for with its default stringency.

    So again, BWA is within the specification, but nevertheless the output is confusing and arguably a bug.
    Last edited by maubp; 11-25-2011, 03:35 AM. Reason: Added links to the spec

    Comment


    • #3
      On further reflection, you could argue Picard's http://picard.sourceforge.net/explain-flags.html is broken as it should ignore these bits when 0x4 is set. Perhaps it would be more helpful to put "(irrelevant)" or something next to them as that would be more useful.

      Comment


      • #4
        To clairfy a bit, bwa concatantes reference sequences, so if a read hangs off of one sequence onto another, it will be both given the right mapping coordinates, and give then 4 flag.

        Second, sam specs call for the unmapped mate of a mapped read to be given the mapping position of the mapped mate. This is so that the unmapped read will sort next to its mapped mate. But that's another source of reads that have the 4 flagged, and have mapping coordinates.

        So this is why you have reads like that.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          Second, sam specs call for the unmapped mate of a mapped read to be given the mapping position of the mapped mate. This is so that the unmapped read will sort next to its mapped mate. But that's another source of reads that have the 4 flagged, and have mapping coordinates
          Thanks - I missed that question in the original post.

          Comment


          • #6
            Thanks for the responses, that is very helpful. That certainly makes sense to give unmapped mates the mapped mate coordinates for sorting purposes, however it is confusing that BWA sets other fields when the read is unmapped. Thank you for explaining to ignore those fields, as I missed that in the SAM specification.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM
            • seqadmin
              Investigating the Gut Microbiome Through Diet and Spatial Biology
              by seqadmin




              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
              02-24-2025, 06:31 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            179 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-28-2025, 12:58 PM
            0 responses
            273 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-24-2025, 02:48 PM
            0 responses
            659 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-21-2025, 02:46 PM
            0 responses
            268 views
            0 likes
            Last Post seqadmin  
            Working...
            X