When aligning paired-end reads with Bowtie2 it's supposed to output the alignments on alternating lines: read1, read2; read1, read2; read1, read2; etc. For one particular type of alignment (note the similar SAM flags below) I'm getting them output in the opposite order, read 2 is coming first. Here are some examples, these are copied directly from the output .sam file:
It only happens with SAM flags 153/137 for the first displayed read and flag 69 for the second displayed read (which is less than 1% of all reads in this particular dataset). I think this means read 2 maps but read 1 doesn't, and for some reason when only read 2 maps it's being output first. Flags 153 and 137 both indicate that a read is the "second in pair" yet it is being output first, while flag 69 is "first in pair" yet it is being output second. Flags checked here: http://picard.sourceforge.net/explain-flags.html
The other problem with these alignments is that for the unmapped read it's still giving a chromosome and coordinates, but it's just giving the values from the corresponding aligned read. I think it would make more sense for these values to be blank or have a * or something for the unmapped read.
This is my first time doing much with Bowtie2 or with .sam files so maybe I'm misunderstanding something (previously was using Bowtie1 and it's default .map output format). Otherwise, are these normal behaviours for Bowtie2?
Code:
[SIZE="3"]HWI-ST724:196:C0LCGACXX:6:1101:12507:1939 153 chr14 109251865 42 101M = 109251865 0 CAGACACTAACTCTGTAGTACACTATGGACAGAGATGGTCTAGCCCATCTTAAGCACCACCACCATTACACTACCACAACTACCAGGAGCAGCAACAGCAA DDEDCDDCCCECEEDEFFFEFFFFHFGHFFJJIIICJIGIGCGIFIGIIIGHHHF:HD@HF?GHGFF<F?HE?IHFIIHHFAIIGGIJHHHHHFFFFFCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:101 YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:12507:1939 69 chr14 109251865 0 * = 109251865 0 NTTAGCTACTACAATAGATCCTGCTCATAGCCTTACCAGAAGTATCTCCTGCCTGCCCATTAGCTACTACAGACACTAACTCTGTAGTACACTATGGACAG #1=DDFDFHHHGFHHIHJJIJGIJJJJJJJJJIJJGIIGGIJCFGIJIIJJJGGIJIJIJJJBGHIIJEIJJJJJDHHHHHFFDFFFEDEEEDEEDDDDDB YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:13494:1989 137 chr16 84333673 42 101M = 84333673 0 TGTTACAGCAAATAAGCAAGACATAAATTAATTCAAGTGAGAAGGAGCCCAGTCTAATTTTCATAGCCTAAATGCCAGGGCCAGGAGGCAGGAGTGGGCAG CCBFFFFFHHGHHJJJJJIJIJIFJIIIJJIIJEIGI?FFHGIHHIJJJJJHHIJIHJIJJIJGIJJJIJJJJJJJJHHGFFDFDDCDDDDDDD<ABDD?B AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:101 YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:13494:1989 69 chr16 84333673 0 * = 84333673 0 NTTGTGCTGGTGCCCCCCCCCCCCAAAAACCCCCTTCCCCCCCCTTTTTTAGGGGGCTCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCTGCCCCCGGCTTT #1=DDFFFHHCFFIIIIIIIIII############################################################################## YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:6762:2115 153 chr9 122691401 42 101M = 122691401 0 TGATTTGGTTTGTCTTGGGGCCAGGGGGTGTTTTACCGAGGTTGTTGGTTGCACAGTTAGTATGGAGCCATTATTCCTAGAAATTGTTTAATGTAGTTTCA C@>8ABDBCC??2@BB@ABA=;CDEB=FGHC@7E@HAF@B;C?CFEFBFD<BF@HFF?CC4BIGGIHEHCHGAEA?A>GIGIGGIGE?C?DD?>DDD=?<@ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:101 YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:6762:2115 69 chr9 122691401 0 * = 122691401 0 GCTCCTGCCTTAAAAAAAAAAAAAAAAAAAAAAAAGGTGTATAAGCCGCAAAGTAAAAGGGCCCCAGAATTTGTGAAATAAGATTGTGGTTTTCTTGCGGG @@@1=?DDBD?HDBAFE:C<DGGGID6A?BBB##################################################################### YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:11774:2147 137 chr9 3032426 0 97M1D4M = 3032426 0 TCCTAAAGTGTGTATTTCTCATTGGACGTGATTTTCAGGTTTCTCGCCATATTCCAGGTCCTACAGTGTGAATTTCTCATTTTTCATGTTTTCCTATATTT @B@FDFFFHHBHDHIJJJJJEHIJIJJFGIGFHGID<FGGGHAGIIEIGIEGHGHIGJCHGHIIGJFHFGGIHJJGHHGHFHHFDFFFFFDDEEECCCDCF AS:i:-51 XN:i:0 XM:i:8 XO:i:1 XG:i:1 NM:i:9 MD:Z:23T0T0C44C15C5T1A0G1^G4 YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:11774:2147 69 chr9 3032426 0 * = 3032426 0 AGGTAGTGAAATATGAAGAGAAATATAGGAAAACATGAAAAATGAGAAATTCACACTGTAGGACCTGGAATATGGCGAGAAACCTGAAAATCACGTCCAAT @@@=DBDDHHHHDICDEEDHIIGHGGHIJJIIJIGIIIFIJIGHIIIIIGIJIIIIJIJIHJDHHIIIJJIJGIJEEFDEDDDECDCDDCDDCCDDDDDDD YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:10316:2079 153 chr1 177325404 40 101M = 177325404 0 GTGTCTGTGTGTGTGGTGAGTGTTTTGCCTGCTTGTATGAGTGTACACCATGTGCATGTATCTGTTGCCCATGAAGGCCAGAAGAGGACATCATATCCCTG CDDDDBBBDDDDDDEEDDEFFFFHHHHJJIIJJIJJIJIJJJIGGIHJIJIJJJJJIIJJJJJJJJIJJJJJJJJJJJIJJJJJJJJIFHGHHFFFFFCCC AS:i:-17 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:4G42A29A23 YT:Z:UP HWI-ST724:196:C0LCGACXX:6:1101:10316:2079 69 chr1 177325404 0 * = 177325404 0 TTTTCTACAAACCTTAAAGACTTCTATTTAGAAATGTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCTGTGTGTCTGTGTGTGTGGTGAGTGTTTTGCC CCCFFFFFHHHHHJJJJJJJJJJJJJJJJIIJIIIIJJJHIGGBFHHHIGHFFHIIHFHGHDHGHH<E)?EHHH;?);BDFCEAE(5=(;(9::@@##### YT:Z:UP[/SIZE]
The other problem with these alignments is that for the unmapped read it's still giving a chromosome and coordinates, but it's just giving the values from the corresponding aligned read. I think it would make more sense for these values to be blank or have a * or something for the unmapped read.
This is my first time doing much with Bowtie2 or with .sam files so maybe I'm misunderstanding something (previously was using Bowtie1 and it's default .map output format). Otherwise, are these normal behaviours for Bowtie2?
Comment