I'm new to RNA-sequencing in general and Tophat in particular, and I've got a few embarrassingly basic questions about the output that I was hoping someone could me help with.
In the results file, I have a reported fusion event that looks like:
Gene A | Chr Q | coordinate | Gene B | Chr R | coordinate | 3 1 4 | <--- <---
Gene A is on the sense strand and has an overlapping antisense pseudogene. Gene B is on the antisense strand.
Searching for the flanking sequences manually in IGV, I find that they're both from the antisense strand of their respective genes. Taking the Tophat output reads literally, it looks like the product is:
Gene B-AS breakpoint Gene A-AS
<----- XXX <-----
My questions are:
-Are the reads shown in Tophat meant as the actual product, or is it just a representation of the fused region?
-How does Tophat determine the strands of the partners? In IGV, it appears that the few breakpoint-overlapping strands align in a near-even mix to the + and - strands for both chromosomes.
-Why does Tophat report Gene A-AS as Gene A, and not the AS pseudogene? It gleefully reported some 300 other pseudogenes to me.
The reason I'm asking these things is that if the fusion is real, Gene A-AS would not be coding even if the Gene B-AS part is. I'm wondering if there's any possibility that Gene A + is fused to Gene B - or even if I've just flat-out misunderstood the output.
Thanks!
In the results file, I have a reported fusion event that looks like:
Gene A | Chr Q | coordinate | Gene B | Chr R | coordinate | 3 1 4 | <--- <---
Gene A is on the sense strand and has an overlapping antisense pseudogene. Gene B is on the antisense strand.
Searching for the flanking sequences manually in IGV, I find that they're both from the antisense strand of their respective genes. Taking the Tophat output reads literally, it looks like the product is:
Gene B-AS breakpoint Gene A-AS
<----- XXX <-----
My questions are:
-Are the reads shown in Tophat meant as the actual product, or is it just a representation of the fused region?
-How does Tophat determine the strands of the partners? In IGV, it appears that the few breakpoint-overlapping strands align in a near-even mix to the + and - strands for both chromosomes.
-Why does Tophat report Gene A-AS as Gene A, and not the AS pseudogene? It gleefully reported some 300 other pseudogenes to me.
The reason I'm asking these things is that if the fusion is real, Gene A-AS would not be coding even if the Gene B-AS part is. I'm wondering if there's any possibility that Gene A + is fused to Gene B - or even if I've just flat-out misunderstood the output.
Thanks!