Dear bioinformatics community,
I have used BWA to align paired-end reads to a mammalian genome. I extracted orphan reads from the alignment with this command:
awk 'and ($2, 0x0004) && ! and ($2, 0x0008) && $3!~/Un/' alignment.sam > orphans.sam
I have noticed some apparent errors in the output, for example:
HWUSI-EAS091C_0049:8:10:13851:9893#0 167 27 39960072 29 11S64M 28 167 0 TGTGAATGTAGTGCCCATGAAGGTGGATGCAGAGATCCGGCAGTCGTTGCAGTGCACCCTGCTGTGTCCCCGTAT IIHIIIIIIIIGIIIIIIHIIIIFIIHIIIHHIGIIIIIFIIHHHIIIFHHIGIIIHHIGHGGEEHGGIHGGDEE XT:A:M NM:i:0 SM:i:29 AM:i:29 XM:i:0 XO:i:0 XG:i:0 MD:Z:64
Using this tool to interpret the samflag http://picard.sourceforge.net/explain-flags.html, it seems that this read is:
Paired
Proper pair
Read unmapped
Mate reverse strand
Second in pair
-how can the read be in a proper pair, yet unmapped?
-how can the read have a mapping location and a non-zero mapping quality if it is unmapped?
Another example:
HWUSI-EAS091C_0049:8:17:3987:3357#0 117 6 84719005 37 9M1D66M Un0684.1-12705 6795 0 ATGATGATGTGGTGATGATGATGATGATGATGATTATGATGATGATGATGATGATGATGGTGATGATGATGATGG IIDIIIGIIG<IHIHHHIEIIGIHGHIIIHDIHFIIIIIIGIIHIIHHIIDIIIIGIIIIIIGIHIIIIIIIIII XT:A:U NM:i:4 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:1 MD:Z:9^A25G6G32A0
Paired
Unmapped
Read reverse
Mate reverse
First in pair
-how can it be ‘mapped to reverse strand’ yet unmapped?
-how can it have a mapping location and a non-zero mapping quality if it unmapped?
On a previous occasion with the same data, I encountered errors when adding RG headers with Picard:
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 2096393, Read name HWUSI-EAS091C_0049:1:29:16923:17016#0, MAPQ should be 0 for unmapped read.
There were countless of these errors, and I was forced to add the flag "VALIDATION_STRINGENCY=LENIENT" to the command line (a helpful tip from this site) to complete task.
Why is BWA giving unmapped reads non-zero mapping qualities?
Any help here would be much appreciated, because other than this I am rather fond of the speed and accuracy of BWA compared to other aligners so it would be nice to be able to correct this issue.
Thanks in advance,
Tally
I have used BWA to align paired-end reads to a mammalian genome. I extracted orphan reads from the alignment with this command:
awk 'and ($2, 0x0004) && ! and ($2, 0x0008) && $3!~/Un/' alignment.sam > orphans.sam
I have noticed some apparent errors in the output, for example:
HWUSI-EAS091C_0049:8:10:13851:9893#0 167 27 39960072 29 11S64M 28 167 0 TGTGAATGTAGTGCCCATGAAGGTGGATGCAGAGATCCGGCAGTCGTTGCAGTGCACCCTGCTGTGTCCCCGTAT IIHIIIIIIIIGIIIIIIHIIIIFIIHIIIHHIGIIIIIFIIHHHIIIFHHIGIIIHHIGHGGEEHGGIHGGDEE XT:A:M NM:i:0 SM:i:29 AM:i:29 XM:i:0 XO:i:0 XG:i:0 MD:Z:64
Using this tool to interpret the samflag http://picard.sourceforge.net/explain-flags.html, it seems that this read is:
Paired
Proper pair
Read unmapped
Mate reverse strand
Second in pair
-how can the read be in a proper pair, yet unmapped?
-how can the read have a mapping location and a non-zero mapping quality if it is unmapped?
Another example:
HWUSI-EAS091C_0049:8:17:3987:3357#0 117 6 84719005 37 9M1D66M Un0684.1-12705 6795 0 ATGATGATGTGGTGATGATGATGATGATGATGATTATGATGATGATGATGATGATGATGGTGATGATGATGATGG IIDIIIGIIG<IHIHHHIEIIGIHGHIIIHDIHFIIIIIIGIIHIIHHIIDIIIIGIIIIIIGIHIIIIIIIIII XT:A:U NM:i:4 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:1 MD:Z:9^A25G6G32A0
Paired
Unmapped
Read reverse
Mate reverse
First in pair
-how can it be ‘mapped to reverse strand’ yet unmapped?
-how can it have a mapping location and a non-zero mapping quality if it unmapped?
On a previous occasion with the same data, I encountered errors when adding RG headers with Picard:
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 2096393, Read name HWUSI-EAS091C_0049:1:29:16923:17016#0, MAPQ should be 0 for unmapped read.
There were countless of these errors, and I was forced to add the flag "VALIDATION_STRINGENCY=LENIENT" to the command line (a helpful tip from this site) to complete task.
Why is BWA giving unmapped reads non-zero mapping qualities?
Any help here would be much appreciated, because other than this I am rather fond of the speed and accuracy of BWA compared to other aligners so it would be nice to be able to correct this issue.
Thanks in advance,
Tally
Comment