Hi there,
If I understand correctly, there are two ways to measure the gene expression from the RNA-seq data. The simple way is to use the tophat2 pipeline, which generates a file named "genes.fpkm_tracking". We can just extract the FPKM value from the file for each transcripts.
We can also use htseq-count to get the reads count from a aligned file created by either bwa or bowtie2 or some other aligners, and then convert the reads count into RPKM value with the formula: 10^9*counts/"total counts"/"transcript length".
However, when comparing the results from the above two methods, I found that the FPKM and RPKM values were pooly correlated. It about 0.5 for spearman correlation coefficient or 0.7 for the pearson correlation coefficient.
So, which method should we trust? I personally feel that the tophat one might be better since the author of htseq once mentioned that htseq is designed to test differential expression but not to quantify expression.
Any idea? is there another/better way to measuring gene expression from RNA-seq data?
Thanks,
If I understand correctly, there are two ways to measure the gene expression from the RNA-seq data. The simple way is to use the tophat2 pipeline, which generates a file named "genes.fpkm_tracking". We can just extract the FPKM value from the file for each transcripts.
We can also use htseq-count to get the reads count from a aligned file created by either bwa or bowtie2 or some other aligners, and then convert the reads count into RPKM value with the formula: 10^9*counts/"total counts"/"transcript length".
However, when comparing the results from the above two methods, I found that the FPKM and RPKM values were pooly correlated. It about 0.5 for spearman correlation coefficient or 0.7 for the pearson correlation coefficient.
So, which method should we trust? I personally feel that the tophat one might be better since the author of htseq once mentioned that htseq is designed to test differential expression but not to quantify expression.
Any idea? is there another/better way to measuring gene expression from RNA-seq data?
Thanks,
Comment