Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with HTSeq dexseq_count.py script

    Hi all!

    I used samtools to create sam files from bam. Then I sorted sam file with command: sort -s -k 1,1 07pos.sam > 07pos.sorted.sam . Now I am trying to run the script dexseq_count.py (it came along with the DEXSeq package). But I am getting the following error message:

    Code:
    $ python dexseq_count.py --paired=yes out.gff 07pos.sorted.sam 07poscounts.txt
    Traceback (most recent call last):
      File "dexseq_count.py", line 132, in <module>
        for af, ar in HTSeq.pair_SAM_alignments( HTSeq.SAM_Reader( sam_file ) ):
      File "/usr/lib/python2.6/site-packages/HTSeq/__init__.py", line 604, in pair_SAM_alignments
        for almnt in alignments:
      File "/usr/lib/python2.6/site-packages/HTSeq/__init__.py", line 543, in __iter__
        algnt = SAM_Alignment.from_SAM_line( line )
      File "_HTSeq.pyx", line 1249, in HTSeq._HTSeq.SAM_Alignment.from_SAM_line (src/_HTSeq.c:21848)
    UnboundLocalError: local variable 'cigarlist' referenced before assignment
    I could not find the solution by searching forums and Internet. So any ideas?

    Thanks in advance!
    Sander

  • #2
    This looks like a bug that we accidentally introduced in version 0.5.3p5. We fixed this last week, so please try version 0.5.3p7.

    Comment


    • #3
      Hi!

      Thank you, Simon. Version 0.5.3p7 worked.

      Sander

      Comment


      • #4
        dexseq_count.py error with HTSeq-0.5.3p9

        Hello,

        I am using the latest version of HTSeq (-0.5.3p9) and am getting asimilar error as SaunderEST. I have searched on forums for a similar error with the latest version, but no luck.

        I have used this version of HTSeq successfully for HTseq-count as well as
        dexseq_prepare_annotation.py, so I doubt there is a problem with the installation.

        Can anyone help me out here?

        Thanks!

        > ~/software/HTSeq-0.5.3p9/scripts/python_scripts$ python dexseq_count.py -p yes -s no G1_0h.sam /home/PhD_project/RNASeq_data/TopHat_Cufflinks/merged_asm/merged.gff exons_G1_0h.txt

        Traceback (most recent call last):
        File "dexseq_count.py", line 70, in <module>
        for f in HTSeq.GFF_Reader( gff_file ):
        File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 214, in __iter__
        strand, frame, attributeStr ) = line.split( "\t", 8 )
        ValueError: need more than 3 values to unpack

        Comment


        • #5
          Dear @nat,

          Are you using the flattened gtf file produced by dexseq_prepare_annotation.py?
          The script dexseq_count.py is expecting to receive this as input

          Alejandro

          Comment


          • #6
            Dear DEXSeq team, Thanks for the package. Many thanks in advance.

            I am trying to determine what is the cause for me to have all the counts being zero in the dexseq_count.py output files (txt file with rows like ENSG00000000003:001 0). I have 18 RNASeq and using the dexseq_prepare_annotation.py to generate Homo_sapiens.GRCh37.70.gff. I used the script.sh to call samtools and dexseq_count.py. Sam files seem OK since I can see the txt files. dexseq_count.py indeed generated 18 .txt files. But the counts are all zero
            Thanks.
            Wenhong
            P.S. my reads are all paired-end from ILMN HiSeq 2000-2500 for RNAseq with 4X multiplex on human
            ***************copy and paste of some codes*********************
            #! /bin/bash
            sample=(Sample_330-0 Sample_54-0 Sample_54-1 Sample_F60-0 Sample_F60-1 Sample_NLBM2 Sample_NLBM3 Sample_NLBM4 Sample_NLBM5 Sample_R012-0-CD34 Sample_R012-1-CD34 Sample_R1291-0-CD34 Sample_R1291-1-CD34 Sample_R400-0-CD34 Sample_R400-1-CD34 Sample_R400-blasts Sample_R400-MNCs Sample_Y60-0 Sample_Y60-1)

            for i in ${sample[@]}
            do
            echo "working on $i ..."
            python dexseq_count.py -p yes Homo_sapiens.GRCh37.70.gff ${i}_sorted.sam ${i}_fb.txt
            done

            #! /bin/bash
            sample=(Sample_330-0 Sample_330-1 Sample_54-0 Sample_54-1 Sample_F60-0 Sample_F60-1 Sample_NLBM2 Sample_NLBM3 Sample_NLBM4 Sample_NLBM5 Sample_R012-0-CD34 Sample_R012-1-CD34 Sample_R1291-0-CD34 Sample_R1291-1-CD34 Sample_R400-0-CD34 Sample_R400-1-CD34 Sample_R400-blasts Sample_R400-MNCs Sample_Y60-0 Sample_Y60-1)

            for i in ${sample[@]}
            do
            echo "working on $i ..."
            samtools index $i.bam
            samtools view $i.bam > $i.sam
            sort -k1,1 -k2,2n $i.sam > ${i}_sorted.sam
            python dexseq_count.py -p yes Homo_sapiens.GRCh37.70.gff ${i}_sorted.sam ${i}_fb.txt
            done

            the first line in my .gff file is like this:
            1 Homo_sapiens.GRCh37.70.gtf aggregate_gene 11869 14412 . + . gene_id "ENSG00000223972"

            Comment


            • #7
              Hi @wfan,

              One thing, could you check if you have consistent chromosome names between your bam files and annotation files? This sometimes happens when you have for example "chr1" in one file and "1" in the other file. The chromosome names should match!

              Alejandro

              Comment


              • #8
                I am updating my previous thread (Problem with HTSeq dexseq_count.py script).
                Indeed, the zero counts result from different version of .gtf file used in Tophat. I rerun the Tophat using the same .gtf files, problem solved.
                Thanks
                Wenhong

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X