Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • greener
    Member
    • Sep 2010
    • 17

    errors while using python dexseq_prepare_annotation.py

    Hi there, I am trying to use the dexseq_prepare_annotation.py tool in DEXseq and have tried several different gtf files but they all seem to get the same error. Not sure what could be causing this (below). Python version maybe? Any suggestions folks have are appreciated. Thanks -Rich

    python dexseq_prepare_annotation.py test.gtf

    Traceback (most recent call last):
    File "dexseq_prepare_annotation.py", line 25, in ?
    for f in HTSeq.GFF_Reader( gtf_file ):
    File "/usr/lib64/python2.4/site-packages/HTSeq-0.5.3p3-py2.4-linux-x86_64.egg/HTSeq/__init__.py", line 204, in __iter__
    for line in FileOrSequence.__iter__( self ):
    File "/usr/lib64/python2.4/site-packages/HTSeq-0.5.3p3-py2.4-linux-x86_64.egg/HTSeq/__init__.py", line 42, in __iter__
    if self.fos.lower().endswith( ( ".gz" , ".gzip" ) ):
    TypeError: expected a character buffer object
  • areyes
    Senior Member
    • Aug 2010
    • 165

    #2
    Hi greener,

    I am not sure, but I think the problem is that you did not specify an output file:

    Code:
    Usage: python dexseq_prepare_annotation.py <in.gtf> <out.gff>
    Alejandro

    Comment

    • greener
      Member
      • Sep 2010
      • 17

      #3
      Thanks areyes, Yes I did try it with an out file. still get the same error

      Comment

      • areyes
        Senior Member
        • Aug 2010
        • 165

        #4
        hmm strange. could you show the first lines of your gtf file?

        Comment

        • greener
          Member
          • Sep 2010
          • 17

          #5
          here is one

          [greener@kojak bowtie]$ head -n 10 /vol01/genome/mouse/ucsc/annotation/mm9_ucsc_refGene.txt.gtf2.2
          chr1 ucsc exon 134212701 134213049 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134221529 134221650 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134224273 134224425 . + 1 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134224707 134224773 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134226534 134226654 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134227135 134227268 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134227897 134230065 . + 1 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
          chr1 ucsc exon 134212701 134213049 . + 0 gene_id "NM_001195025:134212701"; transcript_id "NM_001195025:134212701";
          chr1 ucsc exon 134221529 134221650 . + 0 gene_id "NM_001195025:134212701"; transcript_id "NM_001195025:134212701";
          chr1 ucsc exon 134222782 134222806 . + 1 gene_id "NM_001195025:134212701"; transcript_id "NM_001195025:134212701";

          Comment

          • fadista
            Member
            • Sep 2008
            • 37

            #6
            I got the same error message here. Did you find any solutions?

            Thanks.

            Comment

            • areyes
              Senior Member
              • Aug 2010
              • 165

              #7
              I just noticed we never answer this message, an apology for that.

              Could you try to update python (to at least 2.5 ) and HTSeq to the most recent?
              I was unable to reproduce the error, I think this might solve it. Let me know if not.

              Alejandro

              Comment

              • senkewiczs
                Junior Member
                • Jul 2012
                • 5

                #8
                Hi areyes,

                I'm also getting an error message when trying to use the dexseq_prepare_annotation.py. I'm trying to use it on the gtf file for drosophila from http://useast.ensembl.org/info/data/ftp/index.html.

                $python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.67.gtf Drosophila_melanogaster.BDGP5.67.gff

                The error message I receive is:
                Traceback (most recent call last):
                File "dexseq_prepare_annotation.py", line 89, in <module>
                assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"
                AssertionError: <GenomicFeature: exonic_part 'FBgn0261841+FBgn0261840+FBgn0261837+FBgn0261843+FBgn0261845+FBgn0261844+FBgn0261838+FBgn0261839+FBgn0002781+FBgn0261842' at 3R: 17178958 -> 17178091 (strand '-')> starts too early

                We've used dexseq_prepare_annotation.py on gtf files from other species and it has always worked great. Seems strange since the pasilla package in R uses drosophila as the example dataset.

                Here is the head of the drosophila gtf file:

                3R protein_coding exon 380 509 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "1"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
                3R protein_coding exon 578 1913 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
                3R protein_coding CDS 1115 1913 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
                3R protein_coding start_codon 1115 1117 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
                3R protein_coding exon 7784 8649 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "3"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
                3R protein_coding CDS 7784 8649 . + 2 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "3"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
                3R protein_coding exon 9439 10200 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
                3R protein_coding CDS 9439 9768 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
                3R protein_coding stop_codon 9769 9771 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
                3R protein_coding exon 380 1913 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078962"; exon_number "1"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RA";


                Any thoughts? Thanks in advance

                Comment

                • areyes
                  Senior Member
                  • Aug 2010
                  • 165

                  #9
                  I give a look into that annotation file, the annotation of the gene mod(mdg4) seems to have some problems, I just removed it:

                  Code:
                  grep -v "mod(mdg4)" Drosophila_melanogaster.BDGP5.67.gtf > Drosophila_melanogaster.BDGP5.67.filtered.gtf
                  And the DEXSeq error goes away!

                  Comment

                  • senkewiczs
                    Junior Member
                    • Jul 2012
                    • 5

                    #10
                    Thanks areyes! Helpful as always!

                    Comment

                    • Ajayi Oyeyemi
                      Member
                      • Jul 2012
                      • 30

                      #11
                      Hi everyone,
                      I've been trying to use the dexseq_prepare_annotation.py script but I kept on getting error. Please see the code below.

                      imumorin@ansci253135199:~/myrnaseqexp$ python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.70.gtf Drosophila_melanogaster.BDGP5.70.gff
                      File "dexseq_prepare_annotation.py", line 4
                      <!DOCTYPE html>
                      ^
                      SyntaxError: invalid syntax

                      I got the script from this page https://github.com/olgabot/rna-seq-d..._annotation.py and I used the wget command. Can someone please help me?

                      Comment

                      • Ajayi Oyeyemi
                        Member
                        • Jul 2012
                        • 30

                        #12
                        I was suspecting that it was incorrectly installed since I searched the package but couldn't get the script out. See a part of my ls -al for the script:

                        -rw-rw-r-- 1 imumorin imumorin 55209 2013-02-22 15:24 dexseq_prepare_annotation.py

                        Any clue?

                        Comment

                        • Simon Anders
                          Senior Member
                          • Feb 2010
                          • 995

                          #13
                          Well, a Python file is not supposed to contain a Doctype tag. You downloaded the HTML source of the page displaying the code of the Python script, not the script itself.

                          Why did you download it separately from the DEXSeq package at all?

                          Use the R command system.file( package="DEXSeq" ) to see which directory R has installed DEXSeq in. There you will find a sub-directory python-scripts containing the correct file.

                          Comment

                          • Ajayi Oyeyemi
                            Member
                            • Jul 2012
                            • 30

                            #14
                            Originally posted by Simon Anders View Post
                            Well, a Python file is not supposed to contain a Doctype tag. You downloaded the HTML source of the page displaying the code of the Python script, not the script itself.

                            Why did you download it separately from the DEXSeq package at all?

                            Use the R command system.file( package="DEXSeq" ) to see which directory R has installed DEXSeq in. There you will find a sub-directory python-scripts containing the correct file.
                            Thanks Simon. I followed your que and it worked for version 2.15. I'm using ubuntu linux and I tried to upgrade it to version 2.15.2 but it appears I couldn't figure it out. Is there anyone that knows how I can upgrade the R version 2.13 in ubuntu linux to 2.15.2?
                            Any help will be greatly appreciated.

                            Thanks Simon once again...

                            Comment

                            • Simon Anders
                              Senior Member
                              • Feb 2010
                              • 995

                              #15
                              The R version included in Canonical's official Ubuntu package repository is always a bit old. If you add the package repository from CRAN to your package sources, you can always get the newest version.

                              See here for details: http://cran.r-project.org/bin/linux/ubuntu/README

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Yesterday, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              43 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...