Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error with paired-end data

    Hi,

    I'm trying to run rhe python script for counting the reads for a dexseq analysis.
    But I keep getting a strange error I can't understand

    Code:
    python ~/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py -p yes -s no MmusGRCm38.DEXSeq.gff -f bam -r pos ../STARmapping_allSamples/C23/C23.STAR.sorted.bam C23.DEXSeq.paired.unstranded.txt
    and the error is:
    Code:
    Traceback (most recent call last):
      File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py", line 236, in <module>
        for a in reader( sam_file ):
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
        yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
      File "_HTSeq.pyx", line 1247, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:24235)
      File "csamtools.pyx", line 2308, in csamtools.AlignedRead.tags.__get__ (lib/pysam/csamtools.c:19977)
    OverflowError: unsigned byte integer is less than minimum
    The bam files were created by the STAR aligner and multiple mapping was allowed.

    here is a sample of one of the bam files. I attached the first 1000 lines of the bam files as a file to test.

    Code:
    samtools view ../STARmapping_allSamples/C23/C23.STAR.sorted.bam | head
    HISEQ:244:C492NACXX:2:1312:7663:90531   99      10      3138670 255     86M15S  =       3138670 86      
    HISEQ:244:C492NACXX:2:1312:7663:90531   147     10      3138670 255     15S86M  =       3138670 -86     
    HISEQ:244:C492NACXX:2:1314:12970:38730  99      10      3140537 255     90M11S  =       3140537 90      
    HISEQ:244:C492NACXX:2:1314:12970:38730  147     10      3140537 255     11S90M  =       3140537 -90     
    HISEQ:244:C492NACXX:2:2110:4692:21500   99      10      3147218 3       101M    =       3147234 117     
    HISEQ:244:C492NACXX:2:2110:4692:21500   147     10      3147234 3       101M    =       3147218 -117    
    HISEQ:244:C492NACXX:2:2110:14708:71286  99      10      3199864 3       101M    =       3199882 119     
    HISEQ:244:C492NACXX:2:2110:14708:71286  147     10      3199882 3       101M    =       3199864 -119    
    HISEQ:244:C492NACXX:2:1316:5341:98737   419     10      3238883 0       101M    =       3238903 121     
    HISEQ:244:C492NACXX:2:1316:5341:98737   339     10      3238903 0       101M    =       3238883 -121
    Attached Files

  • #2
    My only guess is that it things like jM:B:c,-1 aren't supported by htseq-count. Maybe you can remove them with awk?

    Comment


    • #3
      yes, I thought this would be a problem. I have removed the "unusual" falgs and it works perfectly.

      thanks again

      Comment


      • #4
        Dear all,

        I have a very similar problem as frymor, when using the dexseq_count.py script.

        However, the error I get is this:

        Code:
        Traceback (most recent call last):
          File "/home/ibis/kinga.balazs/RNA-Seq/STAR_outputs/BAM_new/sorted/dexseq_count.py", line 239, in <module>
            for a in reader( sam_file ):
          File "/home/ibis/kinga.balazs/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
            yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
          File "_HTSeq.pyx", line 1233, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:23869)
          File "pysam/calignedsegment.pyx", line 2077, in pysam.calignedsegment.AlignedSegment.qual.__get__ (pysam/calignedsegment.c:22616)
          File "pysam/cutils.pyx", line 43, in pysam.cutils.array_to_qualitystring (pysam/cutils.c:1845)
        OverflowError: unsigned byte integer is greater than maximum
        I have to mention that I have 36 samples and I get this error only for 4 of them, which is very strange. For the rest it works perfectly. These are not the biggest files, nor were they generated differently. I also checked, if my SAM files contain one of the flags mentioned by Devon Ryan, and no.

        Also for these samples it starts runnig, and for one of them after processing more than 24 million reads I get this error.

        I used STAR as aligner and I also have paired end reads.

        I would be very thankful for any suggestions.

        Comment


        • #5
          What versions of pysam and Cython do you have installed?

          Comment


          • #6
            I have pysam 0.9.1.4, no Cython and Python 2.7.5

            Comment


            • #7
              My only suggestion would be to try a different pysam version. Perhaps you'll get lucky with that.

              Comment


              • #8
                After trying several things out (also installing an older pysam version did not solve the problem) I tried also to use the SAM files not the BAMs. And it worked. I really don't know why can this happen. As mentioned before, for 32 samples out of 36, the program works on both SAM and BAM files, and for the rest only when I use the SAM files. I just wanted to let you know, that this was the solution I found.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:57 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Working...
                X