Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dhir_kumar
    Postdoc
    • Oct 2013
    • 4

    Cuffcompare stats: High sensitivity and Low specificity....... what does it mean?

    Hi,

    I am using cuffcompare from cufflinks suite to check and compare the transcriptome assemblies from STAR/cufflinks and TopHat/cufflinks to the reference annotation. While assemblies from aligners STAR and TopHat seem quite comparable in numbers, the specificities reported for both the assemblies seem alarming.

    Is it ok to have low specificity??? How good are these assemblies?

    The cuffcmp.stats is as follows
    ##########################################################

    #= Summary for dataset: SRR594419_STAR_filtered_transcripts.gtf :
    # Query mRNAs : 103797 in 85255 loci (40428 multi-exon transcripts)
    # (10592 multi-transcript loci, ~1.2 transcripts per locus)
    # Reference mRNAs : 29129 in 26270 loci (23160 multi-exon)
    # Corresponding super-loci: 24738
    #--------------------| Sn | Sp | fSn | fSp
    Base level: 99.9 35.6 - -
    Exon level: 99.3 66.6 100.0 68.5
    Intron level: 99.3 86.0 100.0 87.4
    Intron chain level: 95.3 54.6 100.0 63.6
    Transcript level: 90.0 25.3 89.9 25.2
    Locus level: 96.7 29.5 99.9 30.4

    Matching intron chains: 22068
    Matching loci: 25390

    Missed exons: 37/210468 ( 0.0%)
    Novel exons: 80952/313777 ( 25.8%)
    Missed introns: 1182/183787 ( 0.6%)
    Novel introns: 14048/212202 ( 6.6%)
    Missed loci: 0/26270 ( 0.0%)
    Novel loci: 46321/85255 ( 54.3%)

    #= Summary for dataset: SRR594419_tophat_transcripts.gtf :
    # Query mRNAs : 104746 in 87334 loci (38090 multi-exon transcripts)
    # (10015 multi-transcript loci, ~1.2 transcripts per locus)
    # Reference mRNAs : 29129 in 26270 loci (23160 multi-exon)
    # Corresponding super-loci: 25059
    #--------------------| Sn | Sp | fSn | fSp
    Base level: 99.9 36.1 - -
    Exon level: 99.3 68.3 100.0 69.0
    Intron level: 99.3 88.9 99.7 89.3
    Intron chain level: 95.4 58.0 100.0 66.0
    Transcript level: 89.3 24.8 89.0 24.8
    Locus level: 96.7 28.9 99.8 29.7

    Matching intron chains: 22098
    Matching loci: 25414

    Missed exons: 72/210468 ( 0.0%)
    Novel exons: 78071/306064 ( 25.5%)
    Missed introns: 1197/183787 ( 0.7%)
    Novel introns: 10768/205343 ( 5.2%)
    Missed loci: 19/26270 ( 0.1%)
    Novel loci: 48238/87334 ( 55.2%)

    Total union super-loci across all input datasets: 92143
    (11373 multi-transcript, ~1.5 transcripts per locus)
    ################################################################
  • sindrle
    Senior Member
    • Aug 2013
    • 266

    #2
    I have the same problem, did you find an answer?

    Comment

    • N00bSeq
      Member
      • Mar 2014
      • 12

      #3
      I am also curious about this. Running cuffcompare on my cuffmerge output results in these numbers:

      Code:
      #     Query mRNAs :  865356 in  787440 loci  (97791 multi-exon transcripts)
      #            (16955 multi-transcript loci, ~1.1 transcripts per locus)
      # Reference mRNAs :   95598 in   36914 loci  (82214 multi-exon)
      # Super-loci w/ reference transcripts:    33985
      #--------------------|   Sn   |  Sp   |  fSn |  fSp
              Base level:      99.6     8.5     -       -
              Exon level:     110.6    35.2   100.0    36.2
            Intron level:      99.2    96.9   100.0    98.8
      Intron chain level:      80.3    67.5   100.0   100.0
        Transcript level:      74.7     8.3    70.2     7.8
             Locus level:      99.1     4.6    99.6     4.6
      
           Matching intron chains:   66045
                    Matching loci:   36587
      
                Missed exons:    1293/351192  (  0.4%)
                 Novel exons:  755700/1102464 ( 68.5%)
              Missed introns:    1755/243253  (  0.7%)
               Novel introns:    1588/249173  (  0.6%)
                 Missed loci:     157/36914   (  0.4%)
                  Novel loci:  747780/787440  ( 95.0%)
      Reference used was Ensembl mouse from igenomes. The options used for cuffcompare were the following:

      Code:
      ~/cufflinks-2.2.0.Linux_x86_64/cuffcompare -s ~/igenomes/Mus_musculus/Ensembl/NCBIM37/Sequence/Bowtie2Index/genome.fa -r ~/igenomes/Mus_musculus/Ensembl/NCBIM37/Annotation/Genes/genes.gtf -p Ensembl ~/cuffmerge/merged.gtf
      In addition, I got the following class codes:

      Code:
      grep -v "gene_name" Ensembl.combined.gtf | awk '{print $18}' | sort | uniq -c
      
       739578 "u";
      grep "gene_name" Ensembl.combined.gtf | awk '{print $22}' | sort | uniq -c
      
       555263 "=";
       380367 "j";
         2684 "o";
        12920 "x";
      739578 novel transfrags seems a bit much to me.

      Comment

      • dhir_kumar
        Postdoc
        • Oct 2013
        • 4

        #4
        Cuffcompare: Low specificity of transcript assembly

        Hi,
        Cuffcompare introductory page at http://cufflinks.cbcb.umd.edu/manual.html states the following.

        " Cuffcompare produces the following output files:
        1) <outprefix>.stats

        Cuffcompare reports various statistics related to the "accuracy" of the transcripts in each sample when compared to the reference annotation data. The typical gene finding measures of "sensitivity" and "specificity" (as defined in Burset, M., Guigó, R. : Evaluation of gene structure prediction programs (1996) Genomics, 34 (3), pp. 353-367. doi: 10.1006/geno.1996.0298) are calculated at various levels (nucleotide, exon, intron, transcript, gene) for each input file and reported in this file."

        As highlighted in the mentioned 1996 reference's figure 1(Attached) it appears that exons metioned in the annotation GTF would be considered as True prositives and any novel transcript/exon would be considered False positives while calculating sensitivity and specificity by cuffcompare. This explains why we have low specificity measures for whole transcriptome assembly which might have a large number of novel transcripts.

        It seems that we can ignore specificity measure for assembly from whole RNA samples. However, to increase specificity FPKM fileters might be effective.
        Attached Files

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          Today, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM
        • seqadmin
          Investigating the Gut Microbiome Through Diet and Spatial Biology
          by seqadmin




          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
          02-24-2025, 06:31 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        26 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        33 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        25 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        190 views
        0 reactions
        Last Post seqadmin  
        Working...