Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq - Counting with HT-seq at the exon level

    Hi!

    I'm currently looking for differently expressed isoforms and got curious about the behaviour of HT-seq when counting exons. Basically, I wanted to check if those two methods would give me really close results when estimating genes expression :
    • counting at the gene level with HT-seq (HTg)
    • counting at the exon level, then summing all the exons per gene (HTe)


    The correlation is not that great (see attached pdf) and there is a global trend of higher counts from my HTe method. Some of the highlighted genes have crazy differences between the two methods :
    Code:
    ensembl_gene_id value_HTg value_HTe ratio
    ENSG00000205336 21 6806 0.003231967
    ENSG00000165795 73 21996 0.003364095
    When looking in a genome browser for ENSG00000205336, I can count 21 mapping reads : it fits with HTg !
    I believe that if a read is mapping on a splicing junction, it will be counted 2 times when using HT-seq at the exon level and may explain some of the differences.

    In the first steps of the DEXSeq analysis, we have to process a GTF file (from Ensembl, for example) to obtain a GFF with "collapsed" exons from different transcripts of the same gene. For my example gene, the script "dexseq_prepare_annotation.py" generates really small "exonic_part", some have a length of 1bp !

    GTF for ENSG00000205336
    GFF for ENSG00000205336

    Is it expected ?
    I think that each of those exonic parts will be treated as an exon when doing the DEXSeq analysis, could it be a problem ?

    Thanks for your help!

    EDIT :
    If found that thread is pretty similar to my question :
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    It seems that I can't sum the different exonic parts to estimate the gene value as a read can be counted multiple times.
    Attached Files
    Last edited by Yohann; 05-07-2014, 09:01 AM. Reason: found related thread

  • #2
    Dear Yohann

    thanks for the feedback. All of that behaviour is intended, and the rationale behind it is described in the DEXSeq Paper. Briefly, reads that touch multiple counting bins provide evidence for the presence of each of the bins, therefore they are counted for each of the bins. The evidence is not independent, but since the testing in DEXSeq is marginal (bin by bin), the dependence is not a problem (in the same way that the dependence in expression of different genes is not a problem for gene-by-gene testing methods). Therefore the sum of bins counts is typically larger than the gene count.

    Second, exons are split up by the preparation script into multiple parts (bins) if the GTF file has different boundaries for them. This is not always pretty, and could probably be coarse-grained in some cases (and you are welcome to do your own manual or automated curation of counting bins to this end!).

    Hope this.

    Kind regards
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Investigating the Gut Microbiome Through Diet and Spatial Biology
      by seqadmin




      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
      02-24-2025, 06:31 AM
    • seqadmin
      Quality Control Essentials for Next-Generation Sequencing Workflows
      by seqadmin




      Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

      Nucleic Acid Quality Control
      Preparing for NGS starts with isolating the...
      02-10-2025, 01:58 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 03-03-2025, 01:15 PM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-28-2025, 12:58 PM
    0 responses
    124 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-24-2025, 02:48 PM
    0 responses
    485 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-21-2025, 02:46 PM
    0 responses
    241 views
    0 likes
    Last Post seqadmin  
    Working...
    X