Unconfigured Ad

**dietmar13** · 04-06-2013, 09:05 PM

e.g. SpliceGrapher

**genomeHunter** · 04-06-2013, 09:10 PM

Thanks dietmar13. We have tried it, but its slow and some results are very strange. We also tried Cufflinks with the --GTF-guide option, but it takes a lifetime to run and generates a ton of two-exon transcripts.

I am looking for a simple and reliable tool that just generates all possible isoforms from the reads.

GH

**shi** · 04-07-2013, 02:08 AM

Hi GH,

Not sure if this is useful to you, but you may try the Subjunc program included in the Subread package (http://subread.sourceforge.net). Subjunc finds all possible exon-exon junctions from RNA-seq reads. It uses a novel read mapping paradigm called 'seed-and-vote' to map reads and discover exon-exon junctions (http://nar.oxfordjournals.org/conten...kt214.abstract). It is an extremely fast junction detector.

Cheers
Wei

**genomeHunter** · 04-07-2013, 05:38 AM

Thank you so much Wei! I saw the paper a while ago and I will definitely give it a try.

GH

**dietmar13** · 04-07-2013, 06:59 AM

RNA-Seq Unified Mapper (RUM)

comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

RUM provides several output files for these spliced reads / junctions...

**genomeHunter** · 04-07-2013, 07:53 AM

Very interesting. We have been using STAR because we found it to be much (~25-50X) faster than Bowtie2, while being more accurate, but we have not tried RUM yet.

Your stats indicate a nearly 50% improvement over STAR. Have you seen any other performance evaluations for RUM?

GH

**alexdobin** · 04-07-2013, 09:47 AM

Originally posted by dietmar13 View Post

comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

RUM provides several output files for these spliced reads / junctions...

Hi Dietmar,

I was wondering if you could share the details of this evaluation. I have compared RUM with STAR in our paper, and RUM showed similar or lower sensitivity to junctions on both simulated and real data. Are you using annotations for both RUM and STAR in this evaluation? If you used STAR without annotations, you would see approximately ~50% fewer spliced reads, which could explain this large difference.

Cheers
Alex

**dietmar13** · 04-07-2013, 01:24 PM

Hi Alex,

... you could share the detail ...

of course (STAR without annotations, but RUM with annotations) - perhaps you are right, I have to test the new STAR with annotations:
RUM 1.10
STAR 2.0.0
(I know, somewhat outdated, but perhaps more for RUM)

Illumina 2 x 76 PE.

For STAR default parameter file (<parametersDefault>, typical for mapping of 2 x 76 Illumina reads):

Code:

STAR --genomeDir <genome> --genomeLoad LoadAndKeep              \
   --outFilterMismatchNmax 4 --outFilterMismatchNoverLmax 0.1 \
   -- outFilterMatchNmin 40 --readFilesIn                          \
   <sample#1_1.fastq> <sample#1_2.fastq>

Code:

perl RUM_runner.pl lib/rum.config_hg19 <sample#1_1.fastq>,,,<sample#1_2.fastq> \
$tmp 12 $name -limitBowtieNU

RSeQC results see picture:

Attached Files

comparison.png (15.8 KB, 165 views)

**shi** · 04-07-2013, 02:48 PM

The comparisons should not only be performed in terms of mapping percentage, but more importantly they should be carried out in terms of accuracy. Our evaluation results shows that Subjunc is much more accurate than competing methods using simulation data and SEQC data (Tables 6 and 7 in http://nar.oxfordjournals.org/conten...kt214.abstract).

**shi** · 04-07-2013, 03:04 PM

Originally posted by dietmar13 View Post

comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

RUM provides several output files for these spliced reads / junctions...

Did you run subread or subjunc here? For mapping junction reads, you should run subjunc. Also, what was the version you used?

Wei

**dietmar13** · 04-07-2013, 03:42 PM

alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?

wei: I can't decide accuracy, because I don't know the right number of spliced reads ...
subread (-J) -> subjunc (v.1.3.1)

STAR 2.3.0 with gencode.v14 annotation:

RSeQC:

Total Records: 2424478
QC failed: 0
Optical/PCR duplicate: 0
Non Primary Hits 131750
Unmapped reads: 0
Multiple mapped reads: 110496

Uniquely mapped: 2182232
Read-1: 1091116
Read-2: 1091116
Reads map to '+': 1091116
Reads map to '-': 1091116
Non-splice reads: 1989333
Splice reads: 192899
Reads mapped in proper pairs: 2182232

whereas STAR-log file says:

**shi** · 04-07-2013, 03:54 PM

Hi Dietmar,

To make a rigorous evaluation for the junction detectors, you may have to create some simulation data to test them. For example, you can create exon-spanning reads from the human genome using the annotated exon information and this will enable you to assess both sensitivity and accuracy of alternative methods. It will be interesting to see the speed differences between these methods as well.

You many consider using 100bp reads instead of 75bp reads because state of the art sequencers are now typically generating ~100bp reads. You may see different methods behave differently when you use longer reads.

Cheers,
Wei

**shi** · 04-07-2013, 03:58 PM

We will be happy to share with you the simulation data and also the code for generating these data if you want to use them in your evaluation.

Cheers,
Wei

**alexdobin** · 04-07-2013, 06:21 PM

Originally posted by dietmar13 View Post

alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?

In the Log.final.out, STAR counts the total number of "splices" - you can get it by counting the total number of N-operations in CIGARs of all unique alignments. Since some spliced reads can have more than one splice, the number of splices is bigger than the number of spliced reads, which is output by RSeQC, I guess.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Generating splicing graph

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News