Hi,
I am using cuffcompare from cufflinks suite to check and compare the transcriptome assemblies from STAR/cufflinks and TopHat/cufflinks to the reference annotation. While assemblies from aligners STAR and TopHat seem quite comparable in numbers, the specificities reported for both the assemblies seem alarming.
Is it ok to have low specificity??? How good are these assemblies?
The cuffcmp.stats is as follows
##########################################################
#= Summary for dataset: SRR594419_STAR_filtered_transcripts.gtf :
# Query mRNAs : 103797 in 85255 loci (40428 multi-exon transcripts)
# (10592 multi-transcript loci, ~1.2 transcripts per locus)
# Reference mRNAs : 29129 in 26270 loci (23160 multi-exon)
# Corresponding super-loci: 24738
#--------------------| Sn | Sp | fSn | fSp
Base level: 99.9 35.6 - -
Exon level: 99.3 66.6 100.0 68.5
Intron level: 99.3 86.0 100.0 87.4
Intron chain level: 95.3 54.6 100.0 63.6
Transcript level: 90.0 25.3 89.9 25.2
Locus level: 96.7 29.5 99.9 30.4
Matching intron chains: 22068
Matching loci: 25390
Missed exons: 37/210468 ( 0.0%)
Novel exons: 80952/313777 ( 25.8%)
Missed introns: 1182/183787 ( 0.6%)
Novel introns: 14048/212202 ( 6.6%)
Missed loci: 0/26270 ( 0.0%)
Novel loci: 46321/85255 ( 54.3%)
#= Summary for dataset: SRR594419_tophat_transcripts.gtf :
# Query mRNAs : 104746 in 87334 loci (38090 multi-exon transcripts)
# (10015 multi-transcript loci, ~1.2 transcripts per locus)
# Reference mRNAs : 29129 in 26270 loci (23160 multi-exon)
# Corresponding super-loci: 25059
#--------------------| Sn | Sp | fSn | fSp
Base level: 99.9 36.1 - -
Exon level: 99.3 68.3 100.0 69.0
Intron level: 99.3 88.9 99.7 89.3
Intron chain level: 95.4 58.0 100.0 66.0
Transcript level: 89.3 24.8 89.0 24.8
Locus level: 96.7 28.9 99.8 29.7
Matching intron chains: 22098
Matching loci: 25414
Missed exons: 72/210468 ( 0.0%)
Novel exons: 78071/306064 ( 25.5%)
Missed introns: 1197/183787 ( 0.7%)
Novel introns: 10768/205343 ( 5.2%)
Missed loci: 19/26270 ( 0.1%)
Novel loci: 48238/87334 ( 55.2%)
Total union super-loci across all input datasets: 92143
(11373 multi-transcript, ~1.5 transcripts per locus)
################################################################
I am using cuffcompare from cufflinks suite to check and compare the transcriptome assemblies from STAR/cufflinks and TopHat/cufflinks to the reference annotation. While assemblies from aligners STAR and TopHat seem quite comparable in numbers, the specificities reported for both the assemblies seem alarming.
Is it ok to have low specificity??? How good are these assemblies?
The cuffcmp.stats is as follows
##########################################################
#= Summary for dataset: SRR594419_STAR_filtered_transcripts.gtf :
# Query mRNAs : 103797 in 85255 loci (40428 multi-exon transcripts)
# (10592 multi-transcript loci, ~1.2 transcripts per locus)
# Reference mRNAs : 29129 in 26270 loci (23160 multi-exon)
# Corresponding super-loci: 24738
#--------------------| Sn | Sp | fSn | fSp
Base level: 99.9 35.6 - -
Exon level: 99.3 66.6 100.0 68.5
Intron level: 99.3 86.0 100.0 87.4
Intron chain level: 95.3 54.6 100.0 63.6
Transcript level: 90.0 25.3 89.9 25.2
Locus level: 96.7 29.5 99.9 30.4
Matching intron chains: 22068
Matching loci: 25390
Missed exons: 37/210468 ( 0.0%)
Novel exons: 80952/313777 ( 25.8%)
Missed introns: 1182/183787 ( 0.6%)
Novel introns: 14048/212202 ( 6.6%)
Missed loci: 0/26270 ( 0.0%)
Novel loci: 46321/85255 ( 54.3%)
#= Summary for dataset: SRR594419_tophat_transcripts.gtf :
# Query mRNAs : 104746 in 87334 loci (38090 multi-exon transcripts)
# (10015 multi-transcript loci, ~1.2 transcripts per locus)
# Reference mRNAs : 29129 in 26270 loci (23160 multi-exon)
# Corresponding super-loci: 25059
#--------------------| Sn | Sp | fSn | fSp
Base level: 99.9 36.1 - -
Exon level: 99.3 68.3 100.0 69.0
Intron level: 99.3 88.9 99.7 89.3
Intron chain level: 95.4 58.0 100.0 66.0
Transcript level: 89.3 24.8 89.0 24.8
Locus level: 96.7 28.9 99.8 29.7
Matching intron chains: 22098
Matching loci: 25414
Missed exons: 72/210468 ( 0.0%)
Novel exons: 78071/306064 ( 25.5%)
Missed introns: 1197/183787 ( 0.7%)
Novel introns: 10768/205343 ( 5.2%)
Missed loci: 19/26270 ( 0.1%)
Novel loci: 48238/87334 ( 55.2%)
Total union super-loci across all input datasets: 92143
(11373 multi-transcript, ~1.5 transcripts per locus)
################################################################
Comment