There seem to be surprisingly few documented methods for detecting novel transcripts from mapped RNA-seq reads. (Let's leave de novo transcriptome assembly aside for the purposes of this question.) What I mean by that is essentially segmenting the profile of read counts along the reference genome into "transcribed regions" and "non-transcribed regions" and then filtering out regions that correspond to annotated exons. (I don't really care so much about the latter part; what I am interested in is the segmentation.)
The ABI SOLiD pipeline used to have an "NTR" (Novel Transcribed Regions) module, but they took it out, apparently because there were several parameters which needed to be set differently depending on the data set. And TopHat, if I understand things correctly, uses the assemble subcommand of MAQ as the novel-transcript part of its workflow (and I haven't found a proper description of what maq assemble actually does). Apart from ABI and Tophat/Maq, I haven't seen any other formalized or standard ways to detect transcripts.
There is a web service at http://galaxy.fml.mpg.de/ which has a Transcript Prediction subcategory, but it is marked "(v. 0.1, unstable)" so I'm a bit wary of using it. It appears to use an unpublished method, mTIM (the web page of which is under construction; http://www.fml.tuebingen.mpg.de/raetsch/suppl/mtim), which performs "Accurate RNA-seq Read Coverage Segmentation".
So, how do you guys approach this problem?
The ABI SOLiD pipeline used to have an "NTR" (Novel Transcribed Regions) module, but they took it out, apparently because there were several parameters which needed to be set differently depending on the data set. And TopHat, if I understand things correctly, uses the assemble subcommand of MAQ as the novel-transcript part of its workflow (and I haven't found a proper description of what maq assemble actually does). Apart from ABI and Tophat/Maq, I haven't seen any other formalized or standard ways to detect transcripts.
There is a web service at http://galaxy.fml.mpg.de/ which has a Transcript Prediction subcategory, but it is marked "(v. 0.1, unstable)" so I'm a bit wary of using it. It appears to use an unpublished method, mTIM (the web page of which is under construction; http://www.fml.tuebingen.mpg.de/raetsch/suppl/mtim), which performs "Accurate RNA-seq Read Coverage Segmentation".
So, how do you guys approach this problem?
Comment