Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very low Sn/Sp outputs from Cufflinks?

    I have obtained a RNA-seq library from my collaborator with a total of more than 100M reads with length of 36bp from three Illumina sequencing lanes.

    So I tried to use tophat + cufflinks to discover some novel splice isoforms from this library. I do realize that it is ideal to use paired-end reads with longer lengths such as 75bp, I just want to see what I can get from cufflinks. However, the outputs seem to be a bit dissapointing after running cuffcompare:

    #--------------------| Sn | Sp | fSn | fSp
    Base level: 59.0 17.8 - -
    Exon level: 1.7 0.4 18.6 4.0
    Intron level: 7.5 47.0 7.6 47.3
    Intron chain level: 0.1 0.1 0.1 0.1
    Transcript level: 0.0 0.0 0.0 0.0
    Locus level: 0.1 0.0 0.2 0.0
    Missed exons: 66987/206780 ( 32.4%)
    Wrong exons: 813919/958710 ( 84.9%)
    Missed introns: 167142/185318 ( 90.2%)
    Wrong introns: 11318/29587 ( 38.3%)
    Missed loci: 5737/21602 ( 26.6%)
    Wrong loci: 782485/927668 ( 84.3%)

    At the transcript level, both Sn and Sp are zero! Does that mean cufflinks is not supposed to be run with short single-ended RNA-seq data? Is this typical or did I do sth. wrong? Any inputs?

    - L

  • #2
    I'm trying to lift this post. It's strange nobody replies to it. Does that mean nobody know the answer? ...

    Comment


    • #3
      Reads of 36bp are really short for a Tophat + Scripture/Cufflinks approach. The software will run, but you will take a performance hit in addition to getting less informative output. We have analyzed libraries of ~200 million paired 36-mers + 42-mers (20% and 80% of the data respectively) with Tophat + Scripture/Cufflinks. The output was interesting but did not work nearly as well as an approach involving mapping reads directly to a database of junctions, transcripts and genomic sequences. This is not a failing of the Tophat + Cufflinks/Scripture approach, it is simply that these methods are not optimized for such short reads. TopHat attempts to identify splice junctions by splitting the reads (an over simplification). Any method that takes this type of approach will suffer when the reads are that short. If you want to have decent sensitivity/specificity for detecting junctions you could try mapping to a database of known and predicted junction sequences of suitable length... Once you are analyzing libraries like paired 75-mers you will find that Cufflinks and Scripture shine a lot brighter... Another option is to use Trans-ABySS or some other de novo assembly approach to make longer contigs out of your 36-mers and then align these instead...

      Comment


      • #4
        Yeah...That's what I thought. Our library is mainly good for gene expression profiling. Thx for sharing your experience!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X