I've run cuffcompare for the first time and am wondering if my results are reasonable?
Here are the stats:
# Cuffcompare v1.3.0 | Command line was:
#cuffcompare -r /home/RNAseq_tests/HG19_files/genes.gtf -o ./hacat /home/RNAseq_tests/HACAT_GATK_out_5_10/hac_cuff_5_13_no_guide/transcripts.gtf
#
#= Summary for dataset: /home/RNAseq_tests/HACAT_GATK_out_5_10/hac_cuff_5_13_no_guide/transcripts.gtf :
# Query mRNAs : 43266 in 24345 loci (39462 multi-exon transcripts)
# (8452 multi-transcript loci, ~1.8 transcripts per locus)
# Reference mRNAs : 43294 in 24356 loci (39548 multi-exon)
# Corresponding super-loci: 23251
#--------------------| Sn | Sp | fSn | fSp
Base level: 100.0 100.0 - -
Exon level: 99.9 100.0 100.0 100.0
Intron level: 99.8 99.8 100.0 100.0
Intron chain level: 99.8 100.0 100.0 100.0
Transcript level: 99.8 99.8 100.0 100.0
Locus level: 100.0 100.0 100.0 100.0
Matching intron chains: 39471
Matching loci: 24345
Missed exons: 12/238147 ( 0.0%)
Novel exons: 0/238101 ( 0.0%)
Missed introns: 403/218193 ( 0.2%)
Novel introns: 0/218191 ( 0.0%)
Missed loci: 8/24356 ( 0.0%)
Novel loci: 0/24345 ( 0.0%)
Total union super-loci across all input datasets: 24345
And I've run my own statistics to find percentages of "=", "j", etc classes:
Of 108753 entries
43069 match the complete intron train (0.396%) [=]
8964 were identified as novel transcripts (0.082%) [j]
Of these 3777(0.421%) were identified as belonging to novel genes [j, FMI=100]
25358 fragments were found within known introns (0.288%) [i]
31337 were identified as fragments or other (0.288%) [all else]
I'm questioning my results, mostly because of the high number of intronic fragments. My hope is that someone with a more refined eye might be able to tell me if this looks "normal", "strange", etc.
Thanks!
-Jeremy
Here are the stats:
# Cuffcompare v1.3.0 | Command line was:
#cuffcompare -r /home/RNAseq_tests/HG19_files/genes.gtf -o ./hacat /home/RNAseq_tests/HACAT_GATK_out_5_10/hac_cuff_5_13_no_guide/transcripts.gtf
#
#= Summary for dataset: /home/RNAseq_tests/HACAT_GATK_out_5_10/hac_cuff_5_13_no_guide/transcripts.gtf :
# Query mRNAs : 43266 in 24345 loci (39462 multi-exon transcripts)
# (8452 multi-transcript loci, ~1.8 transcripts per locus)
# Reference mRNAs : 43294 in 24356 loci (39548 multi-exon)
# Corresponding super-loci: 23251
#--------------------| Sn | Sp | fSn | fSp
Base level: 100.0 100.0 - -
Exon level: 99.9 100.0 100.0 100.0
Intron level: 99.8 99.8 100.0 100.0
Intron chain level: 99.8 100.0 100.0 100.0
Transcript level: 99.8 99.8 100.0 100.0
Locus level: 100.0 100.0 100.0 100.0
Matching intron chains: 39471
Matching loci: 24345
Missed exons: 12/238147 ( 0.0%)
Novel exons: 0/238101 ( 0.0%)
Missed introns: 403/218193 ( 0.2%)
Novel introns: 0/218191 ( 0.0%)
Missed loci: 8/24356 ( 0.0%)
Novel loci: 0/24345 ( 0.0%)
Total union super-loci across all input datasets: 24345
And I've run my own statistics to find percentages of "=", "j", etc classes:
Of 108753 entries
43069 match the complete intron train (0.396%) [=]
8964 were identified as novel transcripts (0.082%) [j]
Of these 3777(0.421%) were identified as belonging to novel genes [j, FMI=100]
25358 fragments were found within known introns (0.288%) [i]
31337 were identified as fragments or other (0.288%) [all else]
I'm questioning my results, mostly because of the high number of intronic fragments. My hope is that someone with a more refined eye might be able to tell me if this looks "normal", "strange", etc.
Thanks!
-Jeremy