Unconfigured Ad

**mrawlins** · 06-16-2010, 09:07 AM

I used Picard (the Java implementation of SamTools) to create my own exon counter. It took a few hours, though I'm out of practice with Java. I know that BioScope from ABI has a known exon counter, but their pipeline is a bit frustrating. It's not particularly complicated to write one if you use SamTools, and the documentation for Picard is good, so I suspect other programs are available to do it, but I don't know what they are.

**Simon Anders** · 06-17-2010, 08:10 AM

Hi

Originally posted by PFS View Post

I get a count of 3 for a gene that by visual inspection has many more reads mapped to it.

Possible explanations:

- htseq-counts assumes your RNA-Seq data to be strand-specific, i.e., it will only count those genes which map to the strand that the feature is on. If you want it to count reads on both strands, use the '--straded=no' option.

- If a read is not fully contained in the exon, it will not get counted. If you want it to be counted even if it only partially overlaps with the feature, use '--mode=intersection-nonempty'.

- If there are two overlapping exons from different genes, the reads overlapping both might be discarded and counted as "ambiguous".

If neither of these explains your observations, you have found a bug. If so, please contact me by mail with more details. Thanks.

Cheers
Simon

**jorgebm** · 11-02-2010, 04:11 AM

I've have any discrepancies in the count of exons due to your third explanation: Some exons are cataloged as ambiguous because they (exons) belong to splice variants of the same gene. For instance,

htseq-count -texon -iParent -sno -mintersection-nonempty reads.sam hg18_refFlat.gff3 > exoncounts.txt

By other hands, counting by gene maps all the reads in the intra-genic region (not only exons). This is not our goal.

For instance,

htseq-count -t gene -i ID --stranded=no reads.sam hg18_refFlat.gff3 > genecounts.txt

So, Is there any way (using htseq-count) to count exons grouping by gene ("Parent" attribute of "mRNA" row of refFlat file)?

Originally posted by Simon Anders View Post

Hi

Possible explanations:

- htseq-counts assumes your RNA-Seq data to be strand-specific, i.e., it will only count those genes which map to the strand that the feature is on. If you want it to count reads on both strands, use the '--straded=no' option.

- If a read is not fully contained in the exon, it will not get counted. If you want it to be counted even if it only partially overlaps with the feature, use '--mode=intersection-nonempty'.

- If there are two overlapping exons from different genes, the reads overlapping both might be discarded and counted as "ambiguous".

If neither of these explains your observations, you have found a bug. If so, please contact me by mail with more details. Thanks.

Cheers
Simon

**kkamerath** · 10-13-2011, 10:31 AM

counting reads mappint to exons and HTseq question

Hi jorgebm,
Were you able to resolve this question? I am interested in accomplishing the same thing and was hoping HTSeq could accommodate.

Originally posted by jorgebm View Post

So, Is there any way (using htseq-count) to count exons grouping by gene ("Parent" attribute of "mRNA" row of refFlat file)?

**Simon Anders** · 10-13-2011, 10:48 AM

I have included a Python scripts to do something like this with DEXSeq. Maybe have a look.

Apart from that: HTSeq is a Python package intended to facilitate programmin such stiff yourself. htseq-count was originally only intended as a demonstration for HTSeq. So, if you know some Python, just do it yourself (and if you know Java, use Picard, which is somewhat similar to HTSeq). I fully agree with mrawlins: it is not that difficult.

**jorgebm** · 11-03-2011, 08:16 AM

Hi,

I'm sorry for the delay in my reply. I hope it isn't to late to help.....

My goal was count gene hits to test for differential expresion. So finally I discarded "ht-seq" and use CASAVA (We've a GAIIx) couting output. CASAVA generates RPKM an a "Raw count" (sum of coverages for each base within the feature). Then I've used "Raw counts" and DEseq for differential expression testing.

However, previously I also tried "intersectBed" (Bed Tools) to overlap alignments with a gene model (in my case RefSeq). If you don't run CASAVA It's a choice.

Hope it helps.

Regards

**emilyjia2000** · 03-21-2012, 01:26 PM

I sort of have different question. I tried to use HTseq-count to count exons, I got GFF files from NCBI mm9 and extracted the exon information to create exon.gff, but I don't know why there is 0 count in the exon output, does anyone know what's wrong with my process?

Thanks in advance!

**carmeyeii** · 09-06-2012, 07:55 PM

I don't know if it might be due to a read mapping to different isoforms sharing the same exons... I have a GFF file of annotated transcripts, and there are of course several transcripts to most genes, some of which share some of same exons.

For example, gene A may have isoforms 1 and 2, both of which contain exon y. When htseq tries to assign a feature to a read that mapped to exon y, will it discard it as ambiguous since it cannot decide upon assigning it to isoform 1 or 2? Even though both isoforms come from the same gene and have a column in the GFF stating so.

I have a feeling it won't actually be entangled in this sort of a problem, but it could be?

Carmen

**dpryan** · 09-07-2012, 02:21 AM

Originally posted by carmeyeii View Post

I don't know if it might be due to a read mapping to different isoforms sharing the same exons... I have a GFF file of annotated transcripts, and there are of course several transcripts to most genes, some of which share some of same exons.

For example, gene A may have isoforms 1 and 2, both of which contain exon y. When htseq tries to assign a feature to a read that mapped to exon y, will it discard it as ambiguous since it cannot decide upon assigning it to isoform 1 or 2? Even though both isoforms come from the same gene and have a column in the GFF stating so.

I have a feeling it won't actually be entangled in this sort of a problem, but it could be?

Carmen

Normally when one uses htseq-count, one tells it to group reads by gene (-i gene_id or similar), so a read will be kept regardless of how many isoforms of a single gene it can be assigned to.

**Simon Anders** · 09-07-2012, 02:54 AM

Originally posted by emilyjia2000 View Post

I sort of have different question. I tried to use HTseq-count to count exons, I got GFF files from NCBI mm9 and extracted the exon information to create exon.gff, but I don't know why there is 0 count in the exon output, does anyone know what's wrong with my process?

Maybe if you told us how exactly you extracted the exon information and how you called htseq-count, we might be able to help you without resorting to guessing. In any case, have a look at the Python scripts that we supply with DEXSeq. These are meant to count reads mapping to exons. Using htseq-count for this purpose is not at all straight-forward.

**carmeyeii** · 09-07-2012, 06:24 AM

Originally posted by dpryan View Post

Normally when one uses htseq-count, one tells it to group reads by gene (-i gene_id or similar), so a read will be kept regardless of how many isoforms of a single gene it can be assigned to.

Thanks!

**malirose** · 12-03-2013, 07:34 AM

Hi all,
do you have any idea how to separate reads(from a bam file) according to the strand they came from in an RNA-Seq paired end and strand specific data ? I tried both tophat flag and samtools flag , the two results are quite different and many reads are not affected correctly to their specific strand

**dpryan** · 12-04-2013, 02:30 AM

Do you just want pairs that maps to the + strand in one file and those that map to the - strand in another (alternatively, separating paired-end reads by the strand to which read1 aligns)? You can do that with gawk or python (or pretty much any language for that matter). Having said that, if you're trying to count RNAseq reads, then just have htseq-count deal with this for you.

Also, please start a new thread next time. This one is over a year old.

**Dinesh Heisnam** · 12-10-2013, 01:42 AM

HTSeq count of single reads

Hi all,
As far as we come across,it is clear that the HTSeq Count gives the count of paired reads only. Even if we included the single reads in addition to paired reads in mapping, single reads has no meaning.
So my question is whether there is any option in HTSeq count to consider the single read also.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, Yesterday, 10:26 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

counting reads mapping to exons and HTseq question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News