Hi, I am running Picard/MarkDuplicates.jar on my BAM file (from bwa mapping for paired end reads). However, it reports some "SAM validation error". Although I have used "VALIDATION_STRINGENCY=LENIENT" to finish this analysis, I'm quite curious on these unmapped reads because I have used samtools view "-f 2" to filter the raw mapping results. And then I found something interesting... Hope someone here could help to clarify why.
Here are SAM validation error message:
When I pull out the reads from the bam file above, we could see the read details as follows (I just listed three examples):
example1:
example2:
example3:
In example1, the two reads seem properly paired mapped to chrX (both have Flag 0x0002 in the FLAG field), but the read2 also reports 0x0004, which makes Picard report the "SAM validation error". Question: why Flag 0x0002 and 0x0004 could exist simultaneously? why the mapping quality seems quite good but still have the 0x0004 flag? Is it a BWA bug?
Also, I'm quite confused by both example2 and example3. According to the mapping result, two reads from paired end sequencing map to different chromosome (chrY and chrM in these two examples), but why BWA still gives us Ox0002 flag for these reads? 0x0002 should mean that "the read is mapped in a proper pair". Am I right? or there is something that I missed. Hope someone here could help to explain.
Thanks in advance.
Here are SAM validation error message:
Code:
INFO 2012-08-20 20:51:58 MarkDuplicates Read 11000000 records. Tracking 3 as yet unmatched pairs. 2 records in RAM. Last sequence index: 15 INFO 2012-08-20 20:52:15 MarkDuplicates Read 12000000 records. Tracking 3 as yet unmatched pairs. 2 records in RAM. Last sequence index: 17 Ignoring SAM validation error: ERROR: Record 12953464, Read name DFCDZDN1:168:D0PYNACXX:2:1308:15143:25918, MAPQ should be 0 for unmapped read. Ignoring SAM validation error: ERROR: Record 12958985, Read name DFCDZDN1:168:D0PYNACXX:2:1101:2557:162974, MAPQ should be 0 for unmapped read. Ignoring SAM validation error: ERROR: Record 12958986, Read name DFCDZDN1:168:D0PYNACXX:2:1105:19211:157826, MAPQ should be 0 for unmapped read.
example1:
Code:
DFCDZDN1:168:D0PYNACXX:2:1308:15143:25918 99 chrX 166650111 29 101M = 166650212 187 TTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGGTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG @@@DDDD>ADHBCFHHHDIGBGIHHHG9FFE:BDFGC?F?EGC9B8=BG8(6CDE############################################## RG:Z:hy XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:55T45 DFCDZDN1:168:D0PYNACXX:2:1308:15143:25918 151 chrX 166650212 29 12S21M2D63M5S = 166650111 -187 GGTTGGGGTTTGGGTTAGGGTGTGGGTGAGGGTGGGGGTGAGGGTTAGGGTGTGGGTTGGGGTTGGGGTTGGGATTGGGGTAAGTGTTAGGGTTAGGGTTA #####################################################################################A2:?+<AA3A:+DB?8 RG:Z:hy XT:A:M NM:i:15 SM:i:29 AM:i:29 XM:i:13 XO:i:1 XG:i:2 MD:Z:9T0A4T5^TA6T11T0A5A5A5A2G2A4T2G11
Code:
DFCDZDN1:168:D0PYNACXX:2:1101:2557:162974 83 chrM 395 60 101M chrY 15902553 0 TCCAACTTATATGTGAAAATTCATTGTTAGGACCTAAACTCAATAACGAAAGTAATTCTAGTCATTTATAATACACGACAGCTAAGACCCAAACTGGGATT 5CCDCA@DDC@AEEEBDCFDCEEHCEH@EGGDHF:F@)IGHGJIHFEHIJIHAB4BACF<ED9<GGGIEGGECF?IHGHBIIGIHHGIHFFDDFFFFF@@@ RG:Z:hy XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101 DFCDZDN1:168:D0PYNACXX:2:1101:2557:162974 167 chrY 15902553 60 101M chrM 395 0 CAAGTTAATGTAGCTTAATAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTATCCCATAAACACAAAGGTTTGGTCCTGGCCTTATAATTAAT @@@FFFFFGHHHGGIIJDJIHCHHHE;CFHD@EGHEH>D>GBGEHIG99BDHHADGDEBBGGHHEH@FCHJJIGG@;7@CE;(;?77.6>C@;;>CC>>@; RG:Z:hy XT:A:U NM:i:2 XN:i:3 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0G0T99
Code:
DFCDZDN1:168:D0PYNACXX:2:1105:19211:157826 103 chrY 15902553 60 101M chrM 240 0 CAAGTTAATGTAGCTTAATAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTATCCCATAAACACAAAGGTTTGGTCCTGGCCTTATAATTAAT B@CFDDDFHHFGHJJIJJIJJJJJJJGJIIIIJJIJIJJJBHIIIJJGHIJIGGIGJIDFEGIJJIGIIHHIGGHIFCHEHH@BDDDDEEEEEDDCDACCD RG:Z:hy XT:A:U NM:i:2 XN:i:3 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0G0T99 DFCDZDN1:168:D0PYNACXX:2:1105:19211:157826 147 chrM 240 60 101M chrY 15902553 0 AGTGATAAATATTAAGCAATAAACGAAAGTTTGACTAAGTTATACCTCTTAGGGTTGGTAAATTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAAC AACCCDAEEDEDDDDDDCCBDCADDDCDDCDDCCDDDDEFEA>6DFFFFHHHFGIHJIJJJIIHGIHCDJIGHED:IGHEIIJJIJHJHHHHHDFFFF@B@ RG:Z:hy XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101
Also, I'm quite confused by both example2 and example3. According to the mapping result, two reads from paired end sequencing map to different chromosome (chrY and chrM in these two examples), but why BWA still gives us Ox0002 flag for these reads? 0x0002 should mean that "the read is mapped in a proper pair". Am I right? or there is something that I missed. Hope someone here could help to explain.
Thanks in advance.
Comment