Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating splicing graph

    Hello everyone,

    I have a set of gapped aligned RNAseq reads and I want to generate the splicing graph. I was wondering if you could introduce a tool.

    Cheers,
    GH

  • #2
    e.g. SpliceGrapher

    Comment


    • #3
      Thanks dietmar13. We have tried it, but its slow and some results are very strange. We also tried Cufflinks with the --GTF-guide option, but it takes a lifetime to run and generates a ton of two-exon transcripts.

      I am looking for a simple and reliable tool that just generates all possible isoforms from the reads.

      GH

      Comment


      • #4
        Hi GH,

        Not sure if this is useful to you, but you may try the Subjunc program included in the Subread package (http://subread.sourceforge.net). Subjunc finds all possible exon-exon junctions from RNA-seq reads. It uses a novel read mapping paradigm called 'seed-and-vote' to map reads and discover exon-exon junctions (http://nar.oxfordjournals.org/conten...kt214.abstract). It is an extremely fast junction detector.

        Cheers
        Wei

        Comment


        • #5
          Thank you so much Wei! I saw the paper a while ago and I will definitely give it a try.

          GH

          Comment


          • #6
            RNA-Seq Unified Mapper (RUM)

            comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

            RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

            RUM provides several output files for these spliced reads / junctions...

            Comment


            • #7
              Very interesting. We have been using STAR because we found it to be much (~25-50X) faster than Bowtie2, while being more accurate, but we have not tried RUM yet.

              Your stats indicate a nearly 50% improvement over STAR. Have you seen any other performance evaluations for RUM?

              GH

              Comment


              • #8
                Originally posted by dietmar13 View Post
                comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

                RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

                RUM provides several output files for these spliced reads / junctions...
                Hi Dietmar,

                I was wondering if you could share the details of this evaluation. I have compared RUM with STAR in our paper, and RUM showed similar or lower sensitivity to junctions on both simulated and real data. Are you using annotations for both RUM and STAR in this evaluation? If you used STAR without annotations, you would see approximately ~50% fewer spliced reads, which could explain this large difference.

                Cheers
                Alex

                Comment


                • #9
                  Hi Alex,
                  ... you could share the detail ...
                  of course (STAR without annotations, but RUM with annotations) - perhaps you are right, I have to test the new STAR with annotations:
                  RUM 1.10
                  STAR 2.0.0
                  (I know, somewhat outdated, but perhaps more for RUM)

                  Illumina 2 x 76 PE.

                  For STAR default parameter file (<parametersDefault>, typical for mapping of 2 x 76 Illumina reads):
                  Code:
                  STAR --genomeDir <genome> --genomeLoad LoadAndKeep              \
                     --outFilterMismatchNmax 4 --outFilterMismatchNoverLmax 0.1 \
                     -- outFilterMatchNmin 40 --readFilesIn                          \
                     <sample#1_1.fastq> <sample#1_2.fastq>
                  Code:
                  perl RUM_runner.pl lib/rum.config_hg19 <sample#1_1.fastq>,,,<sample#1_2.fastq> \
                  $tmp 12 $name -limitBowtieNU
                  RSeQC results see picture:
                  Attached Files
                  Last edited by dietmar13; 04-07-2013, 01:35 PM.

                  Comment


                  • #10
                    The comparisons should not only be performed in terms of mapping percentage, but more importantly they should be carried out in terms of accuracy. Our evaluation results shows that Subjunc is much more accurate than competing methods using simulation data and SEQC data (Tables 6 and 7 in http://nar.oxfordjournals.org/conten...kt214.abstract).

                    Comment


                    • #11
                      Originally posted by dietmar13 View Post
                      comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

                      RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

                      RUM provides several output files for these spliced reads / junctions...
                      Did you run subread or subjunc here? For mapping junction reads, you should run subjunc. Also, what was the version you used?

                      Wei

                      Comment


                      • #12
                        alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
                        why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?

                        wei: I can't decide accuracy, because I don't know the right number of spliced reads ...
                        subread (-J) -> subjunc (v.1.3.1)

                        STAR 2.3.0 with gencode.v14 annotation:

                        RSeQC:
                        Total Records: 2424478
                        QC failed: 0
                        Optical/PCR duplicate: 0
                        Non Primary Hits 131750
                        Unmapped reads: 0
                        Multiple mapped reads: 110496

                        Uniquely mapped: 2182232
                        Read-1: 1091116
                        Read-2: 1091116
                        Reads map to '+': 1091116
                        Reads map to '-': 1091116
                        Non-splice reads: 1989333
                        Splice reads: 192899
                        Reads mapped in proper pairs: 2182232
                        whereas STAR-log file says:
                        Mapping speed, Million of reads per hour | 270.55

                        Number of input reads | 3081258
                        Average input read length | 152
                        UNIQUE READS:
                        Uniquely mapped reads number | 1091116
                        Uniquely mapped reads % | 35.41%
                        Average mapped length | 145.84
                        Number of splices: Total | 202082
                        Number of splices: Annotated (sjdb) | 193745
                        Number of splices: GT/AG | 199463
                        Number of splices: GC/AG | 1217
                        Number of splices: AT/AC | 164
                        Number of splices: Non-canonical | 1238
                        Mismatch rate per base, % | 1.97%
                        Deletion rate per base | 0.01%
                        Deletion average length | 1.48
                        Insertion rate per base | 0.01%
                        Insertion average length | 1.93
                        MULTI-MAPPING READS:
                        Number of reads mapped to multiple loci | 55248
                        % of reads mapped to multiple loci | 1.79%
                        Number of reads mapped to too many loci | 76
                        % of reads mapped to too many loci | 0.00%
                        UNMAPPED READS:
                        % of reads unmapped: too many mismatches | 0.00%
                        % of reads unmapped: too short | 62.79%
                        % of reads unmapped: other | 0.01%

                        Comment


                        • #13
                          Hi Dietmar,

                          To make a rigorous evaluation for the junction detectors, you may have to create some simulation data to test them. For example, you can create exon-spanning reads from the human genome using the annotated exon information and this will enable you to assess both sensitivity and accuracy of alternative methods. It will be interesting to see the speed differences between these methods as well.

                          You many consider using 100bp reads instead of 75bp reads because state of the art sequencers are now typically generating ~100bp reads. You may see different methods behave differently when you use longer reads.

                          Cheers,
                          Wei

                          Comment


                          • #14
                            We will be happy to share with you the simulation data and also the code for generating these data if you want to use them in your evaluation.

                            Cheers,
                            Wei

                            Comment


                            • #15
                              Originally posted by dietmar13 View Post
                              alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
                              why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?
                              In the Log.final.out, STAR counts the total number of "splices" - you can get it by counting the total number of N-operations in CIGARs of all unique alignments. Since some spliced reads can have more than one splice, the number of splices is bigger than the number of spliced reads, which is output by RSeQC, I guess.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:06 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-30-2024, 12:17 PM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-29-2024, 10:49 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X