Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Something strange in flag and mapping quality from BWA (0.6.2-r126)

    Hi, I am running Picard/MarkDuplicates.jar on my BAM file (from bwa mapping for paired end reads). However, it reports some "SAM validation error". Although I have used "VALIDATION_STRINGENCY=LENIENT" to finish this analysis, I'm quite curious on these unmapped reads because I have used samtools view "-f 2" to filter the raw mapping results. And then I found something interesting... Hope someone here could help to clarify why.

    Here are SAM validation error message:
    Code:
    INFO    2012-08-20 20:51:58     MarkDuplicates  Read 11000000 records. Tracking 3 as yet unmatched pairs. 2 records in RAM.  Last sequence index: 15
    INFO    2012-08-20 20:52:15     MarkDuplicates  Read 12000000 records. Tracking 3 as yet unmatched pairs. 2 records in RAM.  Last sequence index: 17
    Ignoring SAM validation error: ERROR: Record 12953464, Read name DFCDZDN1:168:D0PYNACXX:2:1308:15143:25918, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 12958985, Read name DFCDZDN1:168:D0PYNACXX:2:1101:2557:162974, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 12958986, Read name DFCDZDN1:168:D0PYNACXX:2:1105:19211:157826, MAPQ should be 0 for unmapped read.
    When I pull out the reads from the bam file above, we could see the read details as follows (I just listed three examples):
    example1:
    Code:
    DFCDZDN1:168:D0PYNACXX:2:1308:15143:25918       99      chrX    166650111       29      101M    =       166650212       187     TTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGGTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG   @@@DDDD>ADHBCFHHHDIGBGIHHHG9FFE:BDFGC?F?EGC9B8=BG8(6CDE##############################################   RG:Z:hy XT:A:U  NM:i:1  SM:i:29 AM:i:29 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:55T45
    DFCDZDN1:168:D0PYNACXX:2:1308:15143:25918       151     chrX    166650212       29      12S21M2D63M5S   =       166650111       -187    GGTTGGGGTTTGGGTTAGGGTGTGGGTGAGGGTGGGGGTGAGGGTTAGGGTGTGGGTTGGGGTTGGGGTTGGGATTGGGGTAAGTGTTAGGGTTAGGGTTA   #####################################################################################A2:?+<AA3A:+DB?8   RG:Z:hy XT:A:M  NM:i:15 SM:i:29 AM:i:29 XM:i:13 XO:i:1  XG:i:2  MD:Z:9T0A4T5^TA6T11T0A5A5A5A2G2A4T2G11
    example2:
    Code:
    DFCDZDN1:168:D0PYNACXX:2:1101:2557:162974       83      chrM    395     60      101M    chrY    15902553        0       TCCAACTTATATGTGAAAATTCATTGTTAGGACCTAAACTCAATAACGAAAGTAATTCTAGTCATTTATAATACACGACAGCTAAGACCCAAACTGGGATT   5CCDCA@DDC@AEEEBDCFDCEEHCEH@EGGDHF:F@)IGHGJIHFEHIJIHAB4BACF<ED9<GGGIEGGECF?IHGHBIIGIHHGIHFFDDFFFFF@@@   RG:Z:hy XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101
    DFCDZDN1:168:D0PYNACXX:2:1101:2557:162974       167     chrY    15902553        60      101M    chrM    395     0       CAAGTTAATGTAGCTTAATAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTATCCCATAAACACAAAGGTTTGGTCCTGGCCTTATAATTAAT   @@@FFFFFGHHHGGIIJDJIHCHHHE;CFHD@EGHEH>D>GBGEHIG99BDHHADGDEBBGGHHEH@FCHJJIGG@;7@CE;(;?77.6>C@;;>CC>>@;   RG:Z:hy XT:A:U  NM:i:2  XN:i:3  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:0G0T99
    example3:
    Code:
    DFCDZDN1:168:D0PYNACXX:2:1105:19211:157826      103     chrY    15902553        60      101M    chrM    240     0       CAAGTTAATGTAGCTTAATAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTATCCCATAAACACAAAGGTTTGGTCCTGGCCTTATAATTAAT   B@CFDDDFHHFGHJJIJJIJJJJJJJGJIIIIJJIJIJJJBHIIIJJGHIJIGGIGJIDFEGIJJIGIIHHIGGHIFCHEHH@BDDDDEEEEEDDCDACCD   RG:Z:hy XT:A:U  NM:i:2  XN:i:3  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:0G0T99
    DFCDZDN1:168:D0PYNACXX:2:1105:19211:157826      147     chrM    240     60      101M    chrY    15902553        0       AGTGATAAATATTAAGCAATAAACGAAAGTTTGACTAAGTTATACCTCTTAGGGTTGGTAAATTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAAC   AACCCDAEEDEDDDDDDCCBDCADDDCDDCDDCCDDDDEFEA>6DFFFFHHHFGIHJIJJJIIHGIHCDJIGHED:IGHEIIJJIJHJHHHHHDFFFF@B@   RG:Z:hy XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101
    In example1, the two reads seem properly paired mapped to chrX (both have Flag 0x0002 in the FLAG field), but the read2 also reports 0x0004, which makes Picard report the "SAM validation error". Question: why Flag 0x0002 and 0x0004 could exist simultaneously? why the mapping quality seems quite good but still have the 0x0004 flag? Is it a BWA bug?

    Also, I'm quite confused by both example2 and example3. According to the mapping result, two reads from paired end sequencing map to different chromosome (chrY and chrM in these two examples), but why BWA still gives us Ox0002 flag for these reads? 0x0002 should mean that "the read is mapped in a proper pair". Am I right? or there is something that I missed. Hope someone here could help to explain.

    Thanks in advance.
    Last edited by qiongyi; 08-20-2012, 03:32 PM.

  • #2
    Tip: Wrap SAM text in [ code ] and [ /code ] tags (available via the 'advanced' editor view or do it manually - without the spaces I just used) to avoid things like quality scores being shown as smilies

    Comment


    • #3
      Download picard for free. A set of tools for working with high-throughput sequencing data. A set of tools (in Java) for working with next generation sequencing data in the SAM/BAM format. Note that development has moved to GitHub at https://github.com/broadinstitute/picard and support is available on the GATK forum at http://gatkforums.broadinstitute.org/categories/ask-the-team

      Comment


      • #4
        Chr X is only about 166 Mb long, right?

        bwa concatenates all the reference sequences you give it. So if a read hangs off of one reference sequence and onto another, bwa will correctly give its mapping position, but it will also throw the 4 flag, as a sign that something is a little off. But picard will complain about that behavior.

        That concatenation is probably why it thinks the reads that span Chr Y and chrM are properly paired; they probably look that way when the two chromosomes are concatenated like that

        Comment


        • #5
          Originally posted by maubp View Post
          Tip: Wrap SAM text in [ code ] and [ /code ] tags (available via the 'advanced' editor view or do it manually - without the spaces I just used) to avoid things like quality scores being shown as smilies
          Thanks, maubp. Having changed.. It looks better

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            Chr X is only about 166 Mb long, right?

            bwa concatenates all the reference sequences you give it. So if a read hangs off of one reference sequence and onto another, bwa will correctly give its mapping position, but it will also throw the 4 flag, as a sign that something is a little off. But picard will complain about that behavior.

            That concatenation is probably why it thinks the reads that span Chr Y and chrM are properly paired; they probably look that way when the two chromosomes are concatenated like that
            Bingo! The length of chrY is 15902555. So if BWA concatenates chrY and chrM, that's the reason to give these reads 2 flag. OMG, so we should definitely look at the existence of 4 flag as well as 2 flag. Thanks swbarnes2 and Rocketknight.

            Comment


            • #7
              Originally posted by qiongyi View Post
              Bingo! The length of chrY is 15902555. So if BWA concatenates chrY and chrM, that's the reason to give these reads 2 flag. OMG, so we should definitely look at the existence of 4 flag as well as 2 flag. Thanks swbarnes2 and Rocketknight.
              Exactly - you MUST check FLAG 0x4, quoting the SAM/BAM specification,
              Bit 0x4 is the only reliable place to tell whether the segment is unmapped. If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template.
              BWA is therefore within the specification, but it would cause much less confusion if it cleared the other bits of informations after noticing the potential mapping actually fell on the boundary between concatenated references.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X