Unconfigured Ad

**Simon Anders** · 02-06-2013, 08:58 AM

Could you grep for 'Medtr7g090810' in you GTF file and post the lines containing the term?

**syintel87** · 02-06-2013, 09:25 AM

Originally posted by Simon Anders View Post

Could you grep for 'Medtr7g090810' in you GTF file and post the lines containing the term?

The attached file includes the part of 'Medtr7g090810'.
Thank you for sparing your precious time.

Attached Files

Mt_grep.jpg (80.1 KB, 262 views)

**Simon Anders** · 02-06-2013, 09:27 AM

It seems HTSeq got confused because the same gene occurs on both the "+" and the "-_ strand.

**syintel87** · 02-06-2013, 09:52 AM

Originally posted by Simon Anders View Post

It seems HTSeq got confused because the same gene occurs on both the "+" and the "-_ strand.

1. So, annotation file has to be fixed??

2. When I have just made small changes on the "dexseq_prepare_annotation.py", it worked.

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
for f in HTSeq.GFF_Reader( gtf_file ):
if f.type != "exon":
continue
f.attr['transcript_id'] = f.attr['transcript_id'].replace( ":", "_" )
exons[f.iv] += ( f.attr['transcript_id'], f.attr['transcript_id'] )

But, seeing the original gtf file, since there are both exon and CDS, I am not sure whether this code is okay for my gtf file or not.

3. I have another gtf file. For this, I also made a small change. And it worked.

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
for f in HTSeq.GFF_Reader( gtf_file ):
if f.type != "CDS":
continue
f.attr['transcript_id'] = f.attr['transcript_id'].replace( ":", "_" )
CDS[f.iv] += ( f.attr['transcript_id'], f.attr['transcript_id'] )

But, I am not sure whether this code is okay or not.

4. Do you think the codes that I have modified would be okay?
The attached file is about original gtf and dexseq_prepare_annotation.py and output gtf.

Thank you very much!

**syintel87** · 02-06-2013, 09:53 AM

Originally posted by Simon Anders View Post

It seems HTSeq got confused because the same gene occurs on both the "+" and the "-_ strand.

This is the attachment.

Attached Files

dexseq_edit.txt (5.9 KB, 199 views)

**Simon Anders** · 02-06-2013, 10:48 AM

No, yopu cannot change from gene ID to transcript ID, because there may be many genes with several overlapping transcripts, and they won't be handled correctly anymore.

You really should fix your GTF file: Wherever the same gene ID is used for features on different strands, add something to the gene ID. If this is complicated, just add a "+" or "-" to all gene IDs.

BTW, dexseq_prepare only looks at "exon" lines and ignored "CDS" lines

**syintel87** · 02-06-2013, 11:08 AM

Originally posted by Simon Anders View Post

No, yopu cannot change from gene ID to transcript ID, because there may be many genes with several overlapping transcripts, and they won't be handled correctly anymore.

You really should fix your GTF file: Wherever the same gene ID is used for features on different strands, add something to the gene ID. If this is complicated, just add a "+" or "-" to all gene IDs.

BTW, dexseq_prepare only looks at "exon" lines and ignored "CDS" lines

1.
So, you mean I will have to replace each +/- into + ?
(Alternatively, replace each +/- into -).

2.
In my Mhapla.gtf file, it has only exon. So do I need to fix my gtf file by replaceing CDS with exon?

**Simon Anders** · 02-06-2013, 11:19 AM

1. No, change the gene ID from, say, "Medtr7g090810" to "Medtr7g090810+" and "Medtr7g090810-", depending on strand. This is assuming that you know a scripting language. I wouldn't want to do that manually.

Where did you get this strange GTF file from, anyway? Having the same gene name on both strands is a bug.

2. No, why?

**syintel87** · 02-06-2013, 01:40 PM

Originally posted by Simon Anders View Post

1. No, change the gene ID from, say, "Medtr7g090810" to "Medtr7g090810+" and "Medtr7g090810-", depending on strand. This is assuming that you know a scripting language. I wouldn't want to do that manually.

Where did you get this strange GTF file from, anyway? Having the same gene name on both strands is a bug.

2. No, why?

1. Thank you. I'll try that. I obtained these gtf files from a member of my project group.

2. If Mhapla.gtf only has CDS but dexseq_prepare.py only looks at "exon" lines and ignored "CDS" lines, the output might have no line at all, I guess.

**Simon Anders** · 02-06-2013, 01:49 PM

2. Sure, if it's this way round, you need to change the CDS lines to exon lines.

**arkanion** · 11-06-2019, 11:38 PM

Originally posted by Simon Anders View Post

1. No, change the gene ID from, say, "Medtr7g090810" to "Medtr7g090810+" and "Medtr7g090810-", depending on strand. This is assuming that you know a scripting language. I wouldn't want to do that manually.

Where did you get this strange GTF file from, anyway? Having the same gene name on both strands is a bug.

2. No, why?

I have the same problem. If you search UCSC Genome Browser for the gene for ex; HIST2H3C, you will see 2 genes appearing, one on + and on one - strand. dexseq cannot deal with this, but this situation actually happens in reality for those who use gtf file downloaded from UCSC track.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

[DEXSeq] prepare_annotation.py: exonic part starts too early!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News