DEXSeq - Counting with HT-seq at the exon level

Yohann

Junior Member

Join Date: Aug 2013

Posts: 7
- Share
- Tweet
#1

DEXSeq - Counting with HT-seq at the exon level

05-07-2014, 08:10 AM

Hi!

I'm currently looking for differently expressed isoforms and got curious about the behaviour of HT-seq when counting exons. Basically, I wanted to check if those two methods would give me really close results when estimating genes expression :
counting at the gene level with HT-seq (HTg)

counting at the exon level, then summing all the exons per gene (HTe)

The correlation is not that great (see attached pdf) and there is a global trend of higher counts from my HTe method. Some of the highlighted genes have crazy differences between the two methods :

Code:

ensembl_gene_id value_HTg value_HTe ratio ENSG00000205336 21 6806 0.003231967 ENSG00000165795 73 21996 0.003364095

When looking in a genome browser for ENSG00000205336, I can count 21 mapping reads : it fits with HTg !
I believe that if a read is mapping on a splicing junction, it will be counted 2 times when using HT-seq at the exon level and may explain some of the differences.

In the first steps of the DEXSeq analysis, we have to process a GTF file (from Ensembl, for example) to obtain a GFF with "collapsed" exons from different transcripts of the same gene. For my example gene, the script "dexseq_prepare_annotation.py" generates really small "exonic_part", some have a length of 1bp !

GTF for ENSG00000205336
GFF for ENSG00000205336

Is it expected ?
I think that each of those exonic parts will be treated as an exon when doing the DEXSeq analysis, could it be a problem ?

Thanks for your help!

EDIT :
If found that thread is pretty similar to my question :

DEXSeq gene level counts - SEQanswers

http://seqanswers.com/forums/showthread.php?t=25003

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

It seems that I can't sum the different exonic parts to estimate the gene value as a read can be counted multiple times.
Attached Files

Correlation_HT-seq_gene_and_exons_expression.pdf (943.9 KB, 102 views)

Last edited by Yohann; 05-07-2014, 09:01 AM. Reason: found related thread
Tags: dexseq, exons, gene expression, ht-seq
Wolfgang Huber

Senior Member

Join Date: Aug 2009

Posts: 109
- Share
- Tweet
#2

05-09-2014, 11:35 AM

Dear Yohann

thanks for the feedback. All of that behaviour is intended, and the rationale behind it is described in the DEXSeq Paper. Briefly, reads that touch multiple counting bins provide evidence for the presence of each of the bins, therefore they are counted for each of the bins. The evidence is not independent, but since the testing in DEXSeq is marginal (bin by bin), the dependence is not a problem (in the same way that the dependence in expression of different genes is not a problem for gene-by-gene testing methods). Therefore the sum of bins counts is typically larger than the gene count.

Second, exons are split up by the preparation script into multiple parts (bins) if the GTF file has different boundaries for them. This is not always pretty, and could probably be coarse-grained in some cases (and you are welcome to do your own manual or automated curation of counting bins to this end!).

Hope this.

Kind regards
Wolfgang

Wolfgang Huber
EMBL
Comment

Previous template Next

Pathogen Surveillance with Advanced Genomic Tools

by seqadmin

The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
- Channel: Articles
03-24-2025, 11:48 AM
New Genomics Tools and Methods Shared at AGBT 2025

by seqadmin

This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25^th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
- Channel: Articles
03-03-2025, 01:39 PM

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

DEXSeq - Counting with HT-seq at the exon level

Comment

Latest Articles

ad_right_rmr

News