I am trying to calculate RPKM on the tophat data but have come across this issue that I believe could skew my results.
My #input reads to tophat are ~49 million. The number of reads reported by tophat to be mapped are ~55 million. I assume I am getting more reads mapped than the total input due to the "--max-multihits 15" option I had set.
Now for RPKM calculation I am not sure what number should I use for total mapped reads.
1. Total reads mapped by Tophat including multireads
2. Total uniquely mapped reads
If I go with #2 then I think I should also remove all multi reads when I am doing the counting for reads mapping to my genes.
Thanks!
-Abhi
My #input reads to tophat are ~49 million. The number of reads reported by tophat to be mapped are ~55 million. I assume I am getting more reads mapped than the total input due to the "--max-multihits 15" option I had set.
Now for RPKM calculation I am not sure what number should I use for total mapped reads.
1. Total reads mapped by Tophat including multireads
2. Total uniquely mapped reads
If I go with #2 then I think I should also remove all multi reads when I am doing the counting for reads mapping to my genes.
Thanks!
-Abhi
Comment