Unconfigured Ad

**CK_** · 05-20-2012, 07:18 PM

Hi
your cufflinks.gtf most probably contains "." in the strand column. (for 'don't know strand')
cut -f7 cufflinks.gtf | sort | uniq -c
to see whether you got any "."

You can modify dexseq_prepare_annotation.py to work in un-stranded mode
changing
# Step 1: Store all exons with their gene and transcript ID
# in a GenomicArrayOfSets

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=True )

to

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=False )

I guess you can analyze stranded and un-stranded transcripts separately
or discard un-stranded transcripts if there are only few?
I wonder what DEXSeq developers reckon.
ck

**Simon Anders** · 05-21-2012, 12:31 AM

The portion of your GFF file that you posted: Does it contain the offending line 509?

**CK_** · 05-21-2012, 02:32 AM

Changing strand from "+" or "-" to "." causes the error:
#-------------------------------------------------------------------------------
my small_un_str.gtf:

chr1 Cufflinks exon 11869 12227 . . . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";

python dexseq_prepare_annotation.py small_un_str.gtf small_un_str.gff

Traceback (most recent call last):
File "~/R/x86_64-unknown-linux-gnu-library/2.15/DEXSeq/python_scripts/dexseq_prepare_annotation.py", line 29, in <module>
exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
File "_HTSeq.pyx", line 509, in HTSeq._HTSeq.GenomicArray.__getitem__ (src/_HTSeq.c:8702)
KeyError: 'Non-stranded index used for stranded GenomicArray.'
u
#-------------------------------------------------------------------------------
small.gtf
chr1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";

python dexseq_prepare_annotation.py small.gtf small.gff

cat small.gff

chr1 small.gtf aggregate_gene 11869 12227 . + . gene_id "XLOC_000001"
chr1 small.gtf exonic_part 11869 12227 . + . transcripts "TCONS_00000001"; exonic_part_number "001"; gene_id "XLOC_000001"
#-------------------------------------------------------------------------------
Doesn't line 509 refer to python code itself and not gtf?
thank you
ck

**Simon Anders** · 05-21-2012, 02:49 AM

Originally posted by CK_ View Post

Changing strand from "+" or "-" to "." causes the error:

Your suggestion to change "stranded=True" to "stranded=False" is correct. When programming it, I did not expect that people might have GFF files without strand information, so I did not expose this option.

Doesn't line 509 refer to python code itself and not gtf?

You are right, of course. Seems I wasn't fully awake yet when I wrote this earlier this today.

**CK_** · 05-21-2012, 08:15 PM

Hi Simon,

Originally posted by Simon Anders View Post

When programming it, I did not expect that people might have GFF files without strand information, so I did not expose this option.

I reckon (at least with cufflinks) all non-stranded entries are single exon transcripts and hence cufflinks don't know how to guess about strand from splice sites (when using not strand specific library for RNA-seq).
Here's one example output from cufflinks:
#--------------------------------------------
# print all unstranded lines and exclude first exon:
awk '{if($7=="."){print $0}}' merged.gtf | grep -v 'exon_number "1"' | wc -l
0
#--------------------------------------------
awk '{if($7=="+"){print $0}}' merged.gtf | grep -v 'exon_number "1"' | wc -l
859632
#--------------------------------------------

Given library is not strand specific,
dexseq_count.py will be called with '-s no' flag anyways;
Will changing stranded=True to stranded=False in dexseq_prepare_annotation.py affect results for transcripts with known strand in cufflinks.gtf?
Thanks a lot!
ck

**Simon Anders** · 05-22-2012, 01:02 AM

Originally posted by CK_ View Post

dexseq_count.py will be called with '-s no' flag anyways;
Will changing stranded=True to stranded=False in dexseq_prepare_annotation.py affect results for transcripts with known strand in cufflinks.gtf?

If you call the script with '-s no' anyway, this should not make a difference.

**delasa** · 06-04-2013, 06:45 AM

Hi Simon,
I have gone through all the steps for DEXSeq. I want to know which isoforms/ splice variants had significant DE. I was wondering how I can find from DEXSeq results, which genes have been alternatively spliced? what are their isoforms?

Thanks,

**Simon Anders** · 06-04-2013, 06:51 AM

Not sure I understand your question. You already have the list of exons that show significant differential usage, right? And each exon is annotated with a gene ID. So what exactly do you need now?

**delasa** · 06-04-2013, 07:12 AM

Yes, I have genes with exons that have been significantly changed. I have attached "plotDEXSeq" result for one of the genes that have two isoforms:GLYMA01G01805.1, GLYMA01G01805.2.
I want to match up between the isoforms that are shown in the picture with the real one. Can we have access to isoforms in DEXSeq? I have genes and exons that are significantly changed, but I want to find the isoforms. All I can see in the "res1 = DEUresultTable(ecs_DE)" file, is the genes and eons, with no information about the isoforms. Is that clear now?

Attached Files

GLYMA01G01805.pdf (8.8 KB, 39 views)

**Simon Anders** · 06-04-2013, 07:21 AM

In your plot, no exon is significant. I hope you have others.

Anyway, DEXSeq does not offer functionality to perform inference on the level of isoforms. It's DEXSeq's design philosophy to test of differential exon usgae not for differential isoform expression. Our paper gives a few examples why we feel that this is actually an advantage.

**delasa** · 06-04-2013, 07:32 AM

Thanks for your answer. I know that DEXSeq is exons based not isoform based.
I have attached another plot, I guess the pink area implies that there is a significant DE there, right?

For drawing this plot, you are using some information in ExonCountSet object returned by "testForDEU" to find the transcripts, the number of transcripts, right? I want to pull this information out of testForDEU results. To be clear, How I can find information about transcripts of a gene from an ExonCountSet returned by testForDEU function?

Attached Files

GLYMA01G39911.pdf (6.6 KB, 22 views)

**Simon Anders** · 06-04-2013, 07:43 AM

With "featureData(ecs)$transcripts" you get a character vectors with one entry for each exon. This entry is a list of transcript IDs, concatenated with semicolons, namely the IDs of all transcripts that contain the exon.

This information is retained by the dexseq_prepare_annotation.py script and put into the flattened GFF file. DEXSeq stores it in the "transcripts" slot of featureData but uses it only at a single point, namely to draw the transcripts under the plot.

You can try to do your own inference with this information, but in my experience, this does not work too well (though I haven't tried too hard either).

**delasa** · 06-04-2013, 08:04 AM

Thanks a lot for your quick response,
can the information about the "expression", "norCounts", and "splicing" -which are used for "plotDEXSeq" function- be accessed through "featureData(ecs)" as well? or they are stored in another objects.

**Simon Anders** · 06-04-2013, 08:46 AM

norCounts is simply the counts, divided by the size factors. You get this with: counts( ecs, normalized=TRUE" )

The function "estimatelog2FoldChanges" gives you the values for the coefficients. See the paper for the formulae to assemble the plotted values from these. There are also two internal function,
"fitAndArrangeCoefs" and "getEffectsForPlotting", which do this, but they might be a bit hard to understand.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Today, 06:09 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 Today, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Cufflinks GTF for DEXSeq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News