DEXSEQ dexseq_count.py counts

sszelinger

Junior Member

Join Date: Apr 2009

Posts: 3
- Share
- Tweet
#1

DEXSEQ dexseq_count.py counts

02-08-2013, 02:54 PM

Hi,
I am writing because am trying to figure out the concept behind the python script that counts reads per exon for DEXSEQ analysis.
Seems like the count output does not explain what I see in IGV and so the statistics can be off in DEXSEQ and render my analysis pointless.

This is what I have done:
1. created flattened gff from ensembl gtf
python ~/bin/DEXSeq/python_scripts/dexseq_prepare_annotation.py ensembl.63.genes.gtf ensembl.63.genes.gff

2. counted reads in query sorted bam file:
samtools view sorted.bam | python ~/bin/DEXSeq/python_scripts/dexseq_count.py --paired=yes -s no -a 10 ensembl.63.genes.gff - dexseq.counts.txt

3. created samples object for DEXSEQ in R.
I am using DEXSeq_1.4.0. in R Studio 0.97.248

samples = data.frame(
condition = factor(c("unaffected","affected")),
replicate = factor(c(1:1,1:1)),
row.names = factor(c("A","B")),
stringsAsFactors = TRUE,
check.names = FALSE
)

annotationfile = "ensembl.63.genes.gff"
fullFilenames<- list.files("/Users/sszelinger/Documents/PROJECTS/DEXSEQ",full.names=TRUE,pattern="counts.txt")

ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)

Looking up counts for gene of interest
countTableForGene(ecs, "ENSG00000XXXXXX", normalized=FALSE, withDispersion=FALSE)

Exon count for sample A
E001 141
E002 66
E003 45
E004 5
E005 48
E006 37
E007 31
E008 36
E009 64
E010 35

Exon Count for sample B
E001 187
E002 95
E003 66
E004 2
E005 38
E006 29
E007 7
E008 32
E009 55
E010 27

You can see in "A" for exon 4 "E004" there is count=5 and for "B" for exon 4 count = 2. A is unaffected and B is affected.
When I pull up IGV, it seems like that "A" has a lot more reads than "B". A has an average coverage of 20 and B has about 3. In my view B is a great case for exon skipping between case and control, and the integrated DNA-RNA analysis suggest this but the counts do not pick it up. So I have to wonder what are the rules for counting the reads in dexseq? Is it based on HTSeq? If so is dexseq_count.py uses union, intersection_strict, or intersection_non_empty. From the IGV trace it looks like to me A should have at least 10 times the coverage as B at same position.

Can someone explain this or give me hints.
If the python script does not count split reads or those that do not map more than X bases within the exon than I understand, but should be documented with the python script so we can make more educated decisions.
I feel that there a real problem here in those situations where we want to detect exon skipping and the low count numbers based on the python script might be misleading. Especially if the counts biased in both case and control subjects into a direction where no significant difference might be conculded between them.
I appreciate any comment. Hope the IGV trace loads up fine..
Thanks very much,
Szabi
Attached Files

A_vs_B.jpg (48.4 KB, 63 views)
Tags: None
areyes

Senior Member

Join Date: Aug 2010

Posts: 165
- Share
- Tweet
#2

02-11-2013, 02:56 PM

Hi @sszelinger,

Thanks for your interest in DEXSeq!

It sounds strange. First of all, I would double check in the gtf that the E004 corresponds to the genomic region that you are observing in the browser. Note that the python script assigns new exon ID's to each unique non-overlapping exonic region based on all its annotated transcripts, so it might not correspond to the exon number 4 of the gene model from the browser.

My second guess would be regarding the quality of the alignments, the python script to count reads discards those reads with quality below 10 (this includes multiple hits). Maybe some of the alignments seen in the browser were filtered by the script.

Alejandro
Comment

Previous template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 12 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 22 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

DEXSEQ dexseq_count.py counts

Comment

Latest Articles

ad_right_rmr

News