Header Leaderboard Ad


(Reference guided) novel transcript detection from RNA-seq?



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • (Reference guided) novel transcript detection from RNA-seq?

    There seem to be surprisingly few documented methods for detecting novel transcripts from mapped RNA-seq reads. (Let's leave de novo transcriptome assembly aside for the purposes of this question.) What I mean by that is essentially segmenting the profile of read counts along the reference genome into "transcribed regions" and "non-transcribed regions" and then filtering out regions that correspond to annotated exons. (I don't really care so much about the latter part; what I am interested in is the segmentation.)

    The ABI SOLiD pipeline used to have an "NTR" (Novel Transcribed Regions) module, but they took it out, apparently because there were several parameters which needed to be set differently depending on the data set. And TopHat, if I understand things correctly, uses the assemble subcommand of MAQ as the novel-transcript part of its workflow (and I haven't found a proper description of what maq assemble actually does). Apart from ABI and Tophat/Maq, I haven't seen any other formalized or standard ways to detect transcripts.

    There is a web service at http://galaxy.fml.mpg.de/ which has a Transcript Prediction subcategory, but it is marked "(v. 0.1, unstable)" so I'm a bit wary of using it. It appears to use an unpublished method, mTIM (the web page of which is under construction; http://www.fml.tuebingen.mpg.de/raetsch/suppl/mtim), which performs "Accurate RNA-seq Read Coverage Segmentation".

    So, how do you guys approach this problem?
    Last edited by kopi-o; 04-04-2010, 10:36 PM. Reason: fixed some typos

  • #2
    Why removing annotated exons? Novel transcripts may contain a lot of them (alternative splicing, initiation and/or termination of transcription). Also, technically speaking, introns are transcribed too.
    Maybe you could try Gmorse and compare the predicted transcripts/exons with the reference annotation to get new ones.


    • #3

      Good points. Actually I shouldn't have mentioned the annotated exons at all because it is really irrelevant to my core question, which is whether there is any standard or documented way to predict transcribed regions from mapped RNA-seq reads.

      Thanks for reminding me about Gmorse, which I had forgotten about. I'll give it a shot.


      • #4
        Thanks, hope it helps.
        Maybe also by looking for non-exonic contigs of reads, or for peaks of high read density within a sliding window. This would at least propose some new exons.
        Also, some gene finders that can take transcription data into account (ideally ESTs) may be able to process RNA-seq data if provided in a convenient way. Maybe this thread could help.


        • #5
          http://cufflinks.cbcb.umd.edu/ Can assemble transcripts from mapped RNA-Seq reads, given a reference genome, and without the aid of a reference annotation. Using the included tool Cuffcompare, you can compare your transfrags to known, annotated isoforms to identify new transcripts.


          • #6
            Right, thanks.
            BTW Kopi-o, you may be highly interested in the RNAseq Genome Annotation Assessment Project RGASP. Several participants have been actively working on this precise issue.