Dear Friends,
I am new to RNA-Seq. I have tried to use cufflinks' cuffdiff to compare two aligned data to get differentially expressed transcripts/exons ...
However if I used:
cuffdiff -o cuffdiff Homo_sapiens.GRCh37.62.gtf ./read1/accepted_hits.bam ./read2/accepted_hits.bam
in the "gene_exp.diff", all the "ln(fold change)" is 0.
If I used assembled "merged.gtf" instead of the above "Homo_sapiens.GRch37.62.gtf", I can get ln(fold change) and p values, everything looks fine, but I can not get annotate gene symbols for each transcript.
Could someone let me know what's wrong I was doing for the first analysis?
And
How could I also obtained the genes annotation for second anlaysis using "merged.gtf"?
It seems something is wrong. Did I used right gene annotation gtf file? (see below)
Another question is: what is the human annotation.gtf generally used for cuffdiff? I downloaded it from ensemble (see above), is this the right one?. does it have an equivalent from ucsc genome browser? I had expected cufflinks manual would have specially pointed out where to download this for an example, unfortunately, it has no such info. Maybe in the future it can be added.
Thanks.
Thank you very much.
for your information, I also did cuffcompare between annotation I used an merged.gtf:
#==========================================
# Cuffcompare v1.0.3 | Command line was:
#cuffcompare -o ./cuffcompare -s ./hg18.fa -r ./Homo_sapiens.GRCh37.62.gtf ./merged.gtf
#
#= Summary for dataset: ./merge/merged.gtf :
# Query mRNAs : 65079 in 61573 loci (27950 multi-exon transcripts)
# (2938 multi-transcript loci, ~1.1 transcripts per locus)
# Reference mRNAs : 166159 in 50490 loci (145130 multi-exon)
# Corresponding super-loci: 0
#--------------------| Sn | Sp | fSn | fSp
Base level: 0.0 0.0 - -
Exon level: 0.0 0.0 0.0 0.0
Intron level: 0.0 0.0 0.0 0.0
Intron chain level: 0.0 0.0 0.0 0.0
Transcript level: 0.0 0.0 0.0 0.0
Locus level: 0.0 0.0 0.0 0.0
Missed exons: 509677/509677 (100.0%)
Wrong exons: 126588/126588 (100.0%)
Missed introns: 330351/330351 (100.0%)
Wrong introns: 62961/62961 (100.0%)
Missed loci: 0/50490 ( 0.0%)
Wrong loci: 36503/61573 ( 59.3%)
Total union super-loci across all input datasets: 61573
#=================================================
I am new to RNA-Seq. I have tried to use cufflinks' cuffdiff to compare two aligned data to get differentially expressed transcripts/exons ...
However if I used:
cuffdiff -o cuffdiff Homo_sapiens.GRCh37.62.gtf ./read1/accepted_hits.bam ./read2/accepted_hits.bam
in the "gene_exp.diff", all the "ln(fold change)" is 0.
If I used assembled "merged.gtf" instead of the above "Homo_sapiens.GRch37.62.gtf", I can get ln(fold change) and p values, everything looks fine, but I can not get annotate gene symbols for each transcript.
Could someone let me know what's wrong I was doing for the first analysis?
And
How could I also obtained the genes annotation for second anlaysis using "merged.gtf"?
It seems something is wrong. Did I used right gene annotation gtf file? (see below)
Another question is: what is the human annotation.gtf generally used for cuffdiff? I downloaded it from ensemble (see above), is this the right one?. does it have an equivalent from ucsc genome browser? I had expected cufflinks manual would have specially pointed out where to download this for an example, unfortunately, it has no such info. Maybe in the future it can be added.
Thanks.
Thank you very much.
for your information, I also did cuffcompare between annotation I used an merged.gtf:
#==========================================
# Cuffcompare v1.0.3 | Command line was:
#cuffcompare -o ./cuffcompare -s ./hg18.fa -r ./Homo_sapiens.GRCh37.62.gtf ./merged.gtf
#
#= Summary for dataset: ./merge/merged.gtf :
# Query mRNAs : 65079 in 61573 loci (27950 multi-exon transcripts)
# (2938 multi-transcript loci, ~1.1 transcripts per locus)
# Reference mRNAs : 166159 in 50490 loci (145130 multi-exon)
# Corresponding super-loci: 0
#--------------------| Sn | Sp | fSn | fSp
Base level: 0.0 0.0 - -
Exon level: 0.0 0.0 0.0 0.0
Intron level: 0.0 0.0 0.0 0.0
Intron chain level: 0.0 0.0 0.0 0.0
Transcript level: 0.0 0.0 0.0 0.0
Locus level: 0.0 0.0 0.0 0.0
Missed exons: 509677/509677 (100.0%)
Wrong exons: 126588/126588 (100.0%)
Missed introns: 330351/330351 (100.0%)
Wrong introns: 62961/62961 (100.0%)
Missed loci: 0/50490 ( 0.0%)
Wrong loci: 36503/61573 ( 59.3%)
Total union super-loci across all input datasets: 61573
#=================================================
Comment