Hi all -
I have had success so far with TopHat, and it is a great tool!
My RNA-seq workflow is to run TopHat and then Cufflinks. Recently, I noticed that some of the transcripts that are produced are "false positives" due to multi-mapped reads. In otherwords, the vast majority of coverage (RPKM) in those transcripts due to reads that map to multiple locations.
I could eliminate all of these false positives by setting TopHat's (-g/--max-multihits) option to '1', but I would rather not do this because it would sacrifice sensitivity in exchange for guaranteed uniqueness.
Ideally, I want to be able to know some statistics about how many of the reads in each transcript came from multi-mapping vs. uniquely mapping reads. Is there any information in the 'accepted_hits.sam' file about this? What about in any of the other temporary files that TopHat generates.
How does everyone else handle multi-mapping reads and TopHat?
Also, in the SAM output file, I noticed that the mate pair reference is always set to '=' even if the two mates map to different references.
Finally, would it be possible to add the 'insert size' parameter to the accepted_hits.sam file?
Best regards, and thanks again!
I have had success so far with TopHat, and it is a great tool!
My RNA-seq workflow is to run TopHat and then Cufflinks. Recently, I noticed that some of the transcripts that are produced are "false positives" due to multi-mapped reads. In otherwords, the vast majority of coverage (RPKM) in those transcripts due to reads that map to multiple locations.
I could eliminate all of these false positives by setting TopHat's (-g/--max-multihits) option to '1', but I would rather not do this because it would sacrifice sensitivity in exchange for guaranteed uniqueness.
Ideally, I want to be able to know some statistics about how many of the reads in each transcript came from multi-mapping vs. uniquely mapping reads. Is there any information in the 'accepted_hits.sam' file about this? What about in any of the other temporary files that TopHat generates.
How does everyone else handle multi-mapping reads and TopHat?
Also, in the SAM output file, I noticed that the mate pair reference is always set to '=' even if the two mates map to different references.
Finally, would it be possible to add the 'insert size' parameter to the accepted_hits.sam file?
Best regards, and thanks again!
Comment