Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • genomeHunter
    Member
    • Apr 2013
    • 26

    Generating splicing graph

    Hello everyone,

    I have a set of gapped aligned RNAseq reads and I want to generate the splicing graph. I was wondering if you could introduce a tool.

    Cheers,
    GH
  • dietmar13
    Senior Member
    • Mar 2010
    • 107

    #2
    e.g. SpliceGrapher

    Comment

    • genomeHunter
      Member
      • Apr 2013
      • 26

      #3
      Thanks dietmar13. We have tried it, but its slow and some results are very strange. We also tried Cufflinks with the --GTF-guide option, but it takes a lifetime to run and generates a ton of two-exon transcripts.

      I am looking for a simple and reliable tool that just generates all possible isoforms from the reads.

      GH

      Comment

      • shi
        Wei Shi
        • Feb 2010
        • 236

        #4
        Hi GH,

        Not sure if this is useful to you, but you may try the Subjunc program included in the Subread package (http://subread.sourceforge.net). Subjunc finds all possible exon-exon junctions from RNA-seq reads. It uses a novel read mapping paradigm called 'seed-and-vote' to map reads and discover exon-exon junctions (http://nar.oxfordjournals.org/conten...kt214.abstract). It is an extremely fast junction detector.

        Cheers
        Wei

        Comment

        • genomeHunter
          Member
          • Apr 2013
          • 26

          #5
          Thank you so much Wei! I saw the paper a while ago and I will definitely give it a try.

          GH

          Comment

          • dietmar13
            Senior Member
            • Mar 2010
            • 107

            #6
            RNA-Seq Unified Mapper (RUM)

            comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

            RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

            RUM provides several output files for these spliced reads / junctions...

            Comment

            • genomeHunter
              Member
              • Apr 2013
              • 26

              #7
              Very interesting. We have been using STAR because we found it to be much (~25-50X) faster than Bowtie2, while being more accurate, but we have not tried RUM yet.

              Your stats indicate a nearly 50% improvement over STAR. Have you seen any other performance evaluations for RUM?

              GH

              Comment

              • alexdobin
                Senior Member
                • Feb 2009
                • 161

                #8
                Originally posted by dietmar13 View Post
                comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

                RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

                RUM provides several output files for these spliced reads / junctions...
                Hi Dietmar,

                I was wondering if you could share the details of this evaluation. I have compared RUM with STAR in our paper, and RUM showed similar or lower sensitivity to junctions on both simulated and real data. Are you using annotations for both RUM and STAR in this evaluation? If you used STAR without annotations, you would see approximately ~50% fewer spliced reads, which could explain this large difference.

                Cheers
                Alex

                Comment

                • dietmar13
                  Senior Member
                  • Mar 2010
                  • 107

                  #9
                  Hi Alex,
                  ... you could share the detail ...
                  of course (STAR without annotations, but RUM with annotations) - perhaps you are right, I have to test the new STAR with annotations:
                  RUM 1.10
                  STAR 2.0.0
                  (I know, somewhat outdated, but perhaps more for RUM)

                  Illumina 2 x 76 PE.

                  For STAR default parameter file (<parametersDefault>, typical for mapping of 2 x 76 Illumina reads):
                  Code:
                  STAR --genomeDir <genome> --genomeLoad LoadAndKeep              \
                     --outFilterMismatchNmax 4 --outFilterMismatchNoverLmax 0.1 \
                     -- outFilterMatchNmin 40 --readFilesIn                          \
                     <sample#1_1.fastq> <sample#1_2.fastq>
                  Code:
                  perl RUM_runner.pl lib/rum.config_hg19 <sample#1_1.fastq>,,,<sample#1_2.fastq> \
                  $tmp 12 $name -limitBowtieNU
                  RSeQC results see picture:
                  Attached Files
                  Last edited by dietmar13; 04-07-2013, 01:35 PM.

                  Comment

                  • shi
                    Wei Shi
                    • Feb 2010
                    • 236

                    #10
                    The comparisons should not only be performed in terms of mapping percentage, but more importantly they should be carried out in terms of accuracy. Our evaluation results shows that Subjunc is much more accurate than competing methods using simulation data and SEQC data (Tables 6 and 7 in http://nar.oxfordjournals.org/conten...kt214.abstract).

                    Comment

                    • shi
                      Wei Shi
                      • Feb 2010
                      • 236

                      #11
                      Originally posted by dietmar13 View Post
                      comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

                      RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

                      RUM provides several output files for these spliced reads / junctions...
                      Did you run subread or subjunc here? For mapping junction reads, you should run subjunc. Also, what was the version you used?

                      Wei

                      Comment

                      • dietmar13
                        Senior Member
                        • Mar 2010
                        • 107

                        #12
                        alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
                        why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?

                        wei: I can't decide accuracy, because I don't know the right number of spliced reads ...
                        subread (-J) -> subjunc (v.1.3.1)

                        STAR 2.3.0 with gencode.v14 annotation:

                        RSeQC:
                        Total Records: 2424478
                        QC failed: 0
                        Optical/PCR duplicate: 0
                        Non Primary Hits 131750
                        Unmapped reads: 0
                        Multiple mapped reads: 110496

                        Uniquely mapped: 2182232
                        Read-1: 1091116
                        Read-2: 1091116
                        Reads map to '+': 1091116
                        Reads map to '-': 1091116
                        Non-splice reads: 1989333
                        Splice reads: 192899
                        Reads mapped in proper pairs: 2182232
                        whereas STAR-log file says:
                        Mapping speed, Million of reads per hour | 270.55

                        Number of input reads | 3081258
                        Average input read length | 152
                        UNIQUE READS:
                        Uniquely mapped reads number | 1091116
                        Uniquely mapped reads % | 35.41%
                        Average mapped length | 145.84
                        Number of splices: Total | 202082
                        Number of splices: Annotated (sjdb) | 193745
                        Number of splices: GT/AG | 199463
                        Number of splices: GC/AG | 1217
                        Number of splices: AT/AC | 164
                        Number of splices: Non-canonical | 1238
                        Mismatch rate per base, % | 1.97%
                        Deletion rate per base | 0.01%
                        Deletion average length | 1.48
                        Insertion rate per base | 0.01%
                        Insertion average length | 1.93
                        MULTI-MAPPING READS:
                        Number of reads mapped to multiple loci | 55248
                        % of reads mapped to multiple loci | 1.79%
                        Number of reads mapped to too many loci | 76
                        % of reads mapped to too many loci | 0.00%
                        UNMAPPED READS:
                        % of reads unmapped: too many mismatches | 0.00%
                        % of reads unmapped: too short | 62.79%
                        % of reads unmapped: other | 0.01%

                        Comment

                        • shi
                          Wei Shi
                          • Feb 2010
                          • 236

                          #13
                          Hi Dietmar,

                          To make a rigorous evaluation for the junction detectors, you may have to create some simulation data to test them. For example, you can create exon-spanning reads from the human genome using the annotated exon information and this will enable you to assess both sensitivity and accuracy of alternative methods. It will be interesting to see the speed differences between these methods as well.

                          You many consider using 100bp reads instead of 75bp reads because state of the art sequencers are now typically generating ~100bp reads. You may see different methods behave differently when you use longer reads.

                          Cheers,
                          Wei

                          Comment

                          • shi
                            Wei Shi
                            • Feb 2010
                            • 236

                            #14
                            We will be happy to share with you the simulation data and also the code for generating these data if you want to use them in your evaluation.

                            Cheers,
                            Wei

                            Comment

                            • alexdobin
                              Senior Member
                              • Feb 2009
                              • 161

                              #15
                              Originally posted by dietmar13 View Post
                              alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
                              why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?
                              In the Log.final.out, STAR counts the total number of "splices" - you can get it by counting the total number of N-operations in CIGARs of all unique alignments. Since some spliced reads can have more than one splice, the number of splices is bigger than the number of spliced reads, which is output by RSeQC, I guess.

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:08 AM
                              0 responses
                              6 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              53 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...