Unconfigured Ad

**TiborNagy** · 01-24-2014, 05:02 AM

The calculation looks OK.

**ThePresident** · 01-24-2014, 05:47 AM

Yeah, the more I think about it, the more it makes sense. HTSeq-count uses alignment files from bowtie and DESeq (used for DE analysis) uses HTSeq-count tables... Thus, there seems to be no reason why I couldn't use raw reads from HTSeq-count table for RPKM calculations.

**AdrianP** · 01-24-2014, 07:50 AM

Is your data paired end? In that case, you need FPKM counts, not RPKM.

**ThePresident** · 01-24-2014, 07:53 AM

Originally posted by AdrianP View Post

Is your data paired end? In that case, you need FPKM counts, not RPKM.

Nope, single-end... That's why I don't like using Cufflinks who gives FPKM values.

**AdrianP** · 01-24-2014, 08:04 AM

Originally posted by ThePresident View Post

Nope, single-end... That's why I don't like using Cufflinks who gives FPKM values.

In case of single-end reads, FPKM=RPKM. But check and see if the values match, I am curious.

**ThePresident** · 01-24-2014, 08:06 AM

Nope, it does not match for one obvious reason (and I'm again banging my head against the wall):

- I can't just sum all the reads from HTseq table: the number is way over the total number of reads from the library. I'm trying to see why... damn!

**AdrianP** · 01-24-2014, 08:14 AM

Originally posted by ThePresident View Post

Nope, it does not match for one obvious reason (and I'm again banging my head against the wall):

- I can't just sum all the reads from HTseq table: the number is way over the total number of reads from the library. I'm trying to see why... damn!

Use the .fastq file with all of your reads. When you do

Code:

head file.fastq

what do you get?

**dpryan** · 01-24-2014, 08:14 AM

Are you counting multimappers? Also, cufflinks calulated FPKMs will almost never be the same as those calculated by hand (partly due to using a different gene length and partly due to cufflinks using fractional read counts).

**ThePresident** · 01-24-2014, 08:23 AM

@AdrianP : I'm at work now and I don't have access to my linux station now. But, from bowtie alignment I know how much reads I have in my libraries (fastq). For example, in one of my libraries I have 61,402,323 reads, and when counted with HTSeq-count (sum of all reads across all genes) I get 116,898,233 which is almost double.

@dpryan: My understanding of HTSeq-count is that it does not count multimappers. I used intersection-nonempty mode. Here is the final output:

no_feature 9034352
ambiguous 958299
too_low_aQual 0
not_aligned 2097930
alignment_not_unique 0

Fastq file contained 61,402,323 reads and 96.58% of that number have been aligned with bowtie. So, the SAM file has to contain around 59,302,363. So, if there are multimapped reads in mu HTSeq-count, how to get rid of them?

**ThePresident** · 01-24-2014, 08:40 AM

But on the other side, the total number of mapped reads in a the RPKM formula is quite arbitrary. It will be the same for every gene inside one single library, so as long as keep that in mind, the formula still makes sense.

On the other side, I wonder if I could compare RPKM values calculated by this manner in between libraries containing replicates? Ex. If I have three libraries of the same condition (biological replicates) but with different sequencing depth, could I calculate RPKM as explained above and then just average them...?

**Simon Anders** · 01-24-2014, 10:31 AM

[deleted] .

**ThePresident** · 01-24-2014, 10:45 AM

It might seems that I'm flooding this thread, but I just realized that I already had the same problem. I think that the right way of dealing with all this is to extract only uniquely-mapped reads from bowtie-generated BAM files and use them in HTSeq-count, DESeq and RPKM calculations.

My only concern is to lose too much reads du to the multimapping...

**AdrianP** · 01-24-2014, 11:21 AM

Originally posted by ThePresident View Post

It might seems that I'm flooding this thread, but I just realized that I already had the same problem. I think that the right way of dealing with all this is to extract only uniquely-mapped reads from bowtie-generated BAM files and use them in HTSeq-count, DESeq and RPKM calculations.

My only concern is to lose too much reads du to the multimapping...

If you use 61,402,323 as total reads, what value of RPKM do you get for any given gene compared to FPKM by cufflinks?

**ThePresident** · 01-24-2014, 11:35 AM

Originally posted by AdrianP View Post

If you use 61,402,323 as total reads, what value of RPKM do you get for any given gene compared to FPKM by cufflinks?

It's somewhat near but not exactly the same thing. I did it with just a couple of values but still: 8vs9 , 146vs300, 13vs18.

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Calculating RPKM value manually

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News