Hi all,
I did a RNAseq alignment using tohat (provided with a GFF annotation file) and got quite a lot of reads having multiple alignments. I checked the alignment output and found a strange result for a randomly picked read pair. Following is from tophat accepted_hits.sam file
SRR.372451 97 chr11 18416183 3 1M2182N49M = 18421086 6241 GATTCCTTTTGGTTCCAAGTCCAATATGGCAACTCTAAAGGATCAGCTGA HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 CC:Z:= CP:i:18418157 HI:i:0
SRR.372451 353 chr11 18418157 3 1M208N49M = 18421086 4267 GATTCCTTTTGGTTCCAAGTCCAATATGGCAACTCTAAAGGATCAGCTGA HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 HI:i:1
SRR.372451 145 chr11 18421086 3 10M1288N40M = 18416183 -6241 TCTGGCAAAGACTATAATGTAACTGCAAACTCCAAGCTGGTCATTATCAC EHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 CC:Z:= CP:i:18421086 HI:i:0
SRR.372451 401 chr11 18421086 3 10M1288N40M = 18418157 -4267 TCTGGCAAAGACTATAATGTAACTGCAAACTCCAAGCTGGTCATTATCAC EHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 HI:i:1
Tophat reported two alignments for the read pair, both mapped to chromosome 11 at gene transcript NM_005566. For the two alignments, read 2 is the same. However, alignment for read 1 is a bit strange.
The first alignment for read 1 is
18416183 3 1M2182N49M
where, according to the gff file used here, the first base mapped to coordinate 18416183, which happens to be the the last base of the first exon of transcript NM_005566; after 2182 skipped region (intron), the remaining 49 bases mapped to the second exon at coordinate 18418366 (the first base of exon 2) to 18418414.
The secondary alignment for read 1 is
18418157 3 1M208N49M
where the first base mapped to coordinate 18418157, supposed to be inside intron region; the remaining 49 bases mapped to exon 2 as the first alignment.
The first alignment is perfect. But it seems to me that the secondary alignment makes no sense.
I don't know how to explain this result. Does anybody encounter similar situation?
ps, I used tophat v2.0.6.
Thanks,
Alex
I did a RNAseq alignment using tohat (provided with a GFF annotation file) and got quite a lot of reads having multiple alignments. I checked the alignment output and found a strange result for a randomly picked read pair. Following is from tophat accepted_hits.sam file
SRR.372451 97 chr11 18416183 3 1M2182N49M = 18421086 6241 GATTCCTTTTGGTTCCAAGTCCAATATGGCAACTCTAAAGGATCAGCTGA HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 CC:Z:= CP:i:18418157 HI:i:0
SRR.372451 353 chr11 18418157 3 1M208N49M = 18421086 4267 GATTCCTTTTGGTTCCAAGTCCAATATGGCAACTCTAAAGGATCAGCTGA HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 HI:i:1
SRR.372451 145 chr11 18421086 3 10M1288N40M = 18416183 -6241 TCTGGCAAAGACTATAATGTAACTGCAAACTCCAAGCTGGTCATTATCAC EHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 CC:Z:= CP:i:18421086 HI:i:0
SRR.372451 401 chr11 18421086 3 10M1288N40M = 18418157 -4267 TCTGGCAAAGACTATAATGTAACTGCAAACTCCAAGCTGGTCATTATCAC EHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU XS:A:+ NH:i:2 HI:i:1
Tophat reported two alignments for the read pair, both mapped to chromosome 11 at gene transcript NM_005566. For the two alignments, read 2 is the same. However, alignment for read 1 is a bit strange.
The first alignment for read 1 is
18416183 3 1M2182N49M
where, according to the gff file used here, the first base mapped to coordinate 18416183, which happens to be the the last base of the first exon of transcript NM_005566; after 2182 skipped region (intron), the remaining 49 bases mapped to the second exon at coordinate 18418366 (the first base of exon 2) to 18418414.
The secondary alignment for read 1 is
18418157 3 1M208N49M
where the first base mapped to coordinate 18418157, supposed to be inside intron region; the remaining 49 bases mapped to exon 2 as the first alignment.
The first alignment is perfect. But it seems to me that the secondary alignment makes no sense.
I don't know how to explain this result. Does anybody encounter similar situation?
ps, I used tophat v2.0.6.
Thanks,
Alex
Comment