Seqanswers Leaderboard Ad

**dpryan** · 08-29-2013, 05:18 AM

If overlapping genes is such an issue for whatever you're working on, just use a stranded library prep. The likely more common objection to HTSeq is that it "ignores" multimappers rather than trying to extract some meaning from them. Honestly, that particular objection has never really swayed me, since the regions of genes not giving rise to multimapping reads should suffice to provide enough reliable single for differential expression.

Which method you choose will largely come down to how risk averse you are and what your downstream needs will be. If I'm going to use RNAseq results to generate a transgenic mouse or start some drug screens, I'm not going to spend time with RSEM data, the validity of which I'm no where near 100% certain of.

**MichalO** · 08-29-2013, 05:28 AM

Thanks dpryan! The stranded protocol is definitely a good point here. Still it costs some $100 per sample, so thrifty biologists often skip it...

Originally posted by dpryan View Post

If I'm going to use RNAseq results to generate a transgenic mouse or start some drug screens, I'm not going to spend time with RSEM data, the validity of which I'm no where near 100% certain of.

Could you briefly write down your objections towards RSEM? I have mine - like heavy dependence on annotation, not being sure in case of many isoforms, etc etc. Thanks!

**jparsons** · 08-29-2013, 10:12 AM

So I pulled up HTSeq data and RSEM data from the same run, which I have because i've been trying to come up with a good metric to judge quantitation (both of genes and transcripts).

Generally, the HTS count and the RSEM expected counts are within a few percent of one another. However, there are some significant outliers, which from a cursory inspection appear to be almost exclusively mitochondrial genes - presumably ones which are consisting entirely of multi-mapped reads. HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing.

I usually advocate HTSeq for gene counting due to its simplicity, but I'd say that RSEM is on the right side of what we consider to be biological 'truth' in this comparison.

**MichalO** · 08-29-2013, 10:50 AM

Thanks a lot too! That's what I suspected - some small artifacts on both sides, no big differences, at least at the gene level. Have to stop being lazy and try myself

What was the species? H.Sapiens?

Originally posted by jparsons View Post

both of genes and transcripts

Did you do HTSeq on transcript level? and was it similar indeed?

**jparsons** · 08-29-2013, 10:53 AM

It was a human sample. HTSeq claims not to work on the transcript level, I used other programs there. I might just throw it at the wall anyway, but don't have high expectations.

**chadn737** · 08-29-2013, 11:40 AM

Originally posted by jparsons View Post

So I pulled up HTSeq data and RSEM data from the same run, which I have because i've been trying to come up with a good metric to judge quantitation (both of genes and transcripts).

Generally, the HTS count and the RSEM expected counts are within a few percent of one another. However, there are some significant outliers, which from a cursory inspection appear to be almost exclusively mitochondrial genes - presumably ones which are consisting entirely of multi-mapped reads. HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing.

I usually advocate HTSeq for gene counting due to its simplicity, but I'd say that RSEM is on the right side of what we consider to be biological 'truth' in this comparison.

The "HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing" does not make sense to me given how htseq-count works, those reads assigned to pseudogenes would have to be uniquely aligned there in the first place by the aligner. Unless of course, these are specifically psuedogenes overlapping other genes, which even then, the read would have to largely come from the pseudogene not to be discarded by htseq-counts default settings.

**jparsons** · 08-29-2013, 12:02 PM

It didn't make sense to me either, but when I was looking for places where there were discrepancies, that's what popped. If I had to hypothesize, i would think that the pseudo gene has unique sequence relative to the main gene, which by chance a sequencing error manages to catch. The alignment settings that RSEM uses were not identical to the ones I used for HTS, and may have been differently tolerant of mismatches, or maybe RSEM decided that a mm1 alignment to the main gene was more likely than a perfect match to the pseudo gene.

**chadn737** · 08-29-2013, 12:07 PM

Originally posted by jparsons View Post

It didn't make sense to me either, but when I was looking for places where there were discrepancies, that's what popped. If I had to hypothesize, i would think that the pseudo gene has unique sequence relative to the main gene, which by chance a sequencing error manages to catch. The alignment settings that RSEM uses were not identical to the ones I used for HTS, and may have been differently tolerant of mismatches, or maybe RSEM decided that a mm1 alignment to the main gene was more likely than a perfect match to the pseudo gene.

Then that is a difference between aligners, not htseq-count vs RSEM. htseq-count does not align reads or determine their locations. That is done by whatever aligner is used prior to that. So an observed discrepancy in this instance will have occurred at earlier steps and is not a valid comparison of RSEM or htseq-count.

**Simon Anders** · 09-01-2013, 05:17 AM

I would like to add that RSEM and htseq-count are tools with different purposes. RSEM aim is designed to quantify expression strength; htseq-count is not! Rather, it is a tool for the express and sole purpose of forming the first step of an analysis for diferential expression on the gene level. See my post #4 in this thread for an elaboration why these two goals suggest different treatments of overlapping genes and multimapping reads.

**MichalO** · 09-04-2013, 04:03 AM

Thanks a lot Simon! Precisely and down to the point as usual!!

**lpachter** · 09-05-2013, 12:20 AM

Its tempting to think that how one counts doesn't matter (for differential expression purposes), but here I argue that it does:

Magnitude of effect vs. statistical significance

http://liorpachter.wordpress.com/2013/08/26/magnitude-of-effect-vs-statistical-significance/

RNA-Seq is the new kid on the block, but there is still something to be learned from the stodgy microarray. One of the lessons is hidden in a tech report by Daniela Witten and Robert Tibshirani fro…

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

counting wars ;) HTSeq vs RSEM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News