Hello,
I'm dealing with bacterial RNA-seq data. I would like to show absolute gene expression by calculating RPKM value for every gene. Initially, I've been using Cufflinks for that, but I don't like the way Cufflinks deals with it, so I decided to manually calculate my iwn RPKM values.
The equation I want to use is:
RPKM = (10^9 * C)/(N * L), with
C = Number of reads mapped to a gene
N = Total mapped reads in the experiment
L = gene length in base-pairs for a gene
Now, my question is the following: I want to use raw counts (obtained from HTSeq count table) as the number of reads that actually mapped to a gene (C). I think that should be okay. However, for the total mapped reads in the experiment (N) I thought about simply adding all reads from the HTSeq table.
For exemple:
If my gene length (L) is : 200pb
Number of reads mapped (as from HTSeq-count) (C) : 400
Total mapped reads (sum for all genes from HTSeq-count) (N) : 10^8
RPKM = (10^9 * 400)/(10^8 * 200) = 20
Would that be the right way of calculating it? My aim is to do the same thing for each of my sequencing libraries and then simply compare.
Thanks you guys,
TP
I'm dealing with bacterial RNA-seq data. I would like to show absolute gene expression by calculating RPKM value for every gene. Initially, I've been using Cufflinks for that, but I don't like the way Cufflinks deals with it, so I decided to manually calculate my iwn RPKM values.
The equation I want to use is:
RPKM = (10^9 * C)/(N * L), with
C = Number of reads mapped to a gene
N = Total mapped reads in the experiment
L = gene length in base-pairs for a gene
Now, my question is the following: I want to use raw counts (obtained from HTSeq count table) as the number of reads that actually mapped to a gene (C). I think that should be okay. However, for the total mapped reads in the experiment (N) I thought about simply adding all reads from the HTSeq table.
For exemple:
If my gene length (L) is : 200pb
Number of reads mapped (as from HTSeq-count) (C) : 400
Total mapped reads (sum for all genes from HTSeq-count) (N) : 10^8
RPKM = (10^9 * 400)/(10^8 * 200) = 20
Would that be the right way of calculating it? My aim is to do the same thing for each of my sequencing libraries and then simply compare.
Thanks you guys,
TP
Comment