Header Leaderboard Ad


DEXSeq error with paired-end data



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error with paired-end data


    I'm trying to run rhe python script for counting the reads for a dexseq analysis.
    But I keep getting a strange error I can't understand

    python ~/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py -p yes -s no MmusGRCm38.DEXSeq.gff -f bam -r pos ../STARmapping_allSamples/C23/C23.STAR.sorted.bam C23.DEXSeq.paired.unstranded.txt
    and the error is:
    Traceback (most recent call last):
      File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py", line 236, in <module>
        for a in reader( sam_file ):
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
        yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
      File "_HTSeq.pyx", line 1247, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:24235)
      File "csamtools.pyx", line 2308, in csamtools.AlignedRead.tags.__get__ (lib/pysam/csamtools.c:19977)
    OverflowError: unsigned byte integer is less than minimum
    The bam files were created by the STAR aligner and multiple mapping was allowed.

    here is a sample of one of the bam files. I attached the first 1000 lines of the bam files as a file to test.

    samtools view ../STARmapping_allSamples/C23/C23.STAR.sorted.bam | head
    HISEQ:244:C492NACXX:2:1312:7663:90531   99      10      3138670 255     86M15S  =       3138670 86      
    HISEQ:244:C492NACXX:2:1312:7663:90531   147     10      3138670 255     15S86M  =       3138670 -86     
    HISEQ:244:C492NACXX:2:1314:12970:38730  99      10      3140537 255     90M11S  =       3140537 90      
    HISEQ:244:C492NACXX:2:1314:12970:38730  147     10      3140537 255     11S90M  =       3140537 -90     
    HISEQ:244:C492NACXX:2:2110:4692:21500   99      10      3147218 3       101M    =       3147234 117     
    HISEQ:244:C492NACXX:2:2110:4692:21500   147     10      3147234 3       101M    =       3147218 -117    
    HISEQ:244:C492NACXX:2:2110:14708:71286  99      10      3199864 3       101M    =       3199882 119     
    HISEQ:244:C492NACXX:2:2110:14708:71286  147     10      3199882 3       101M    =       3199864 -119    
    HISEQ:244:C492NACXX:2:1316:5341:98737   419     10      3238883 0       101M    =       3238903 121     
    HISEQ:244:C492NACXX:2:1316:5341:98737   339     10      3238903 0       101M    =       3238883 -121
    Attached Files

  • #2
    My only guess is that it things like jM:B:c,-1 aren't supported by htseq-count. Maybe you can remove them with awk?


    • #3
      yes, I thought this would be a problem. I have removed the "unusual" falgs and it works perfectly.

      thanks again


      • #4
        Dear all,

        I have a very similar problem as frymor, when using the dexseq_count.py script.

        However, the error I get is this:

        Traceback (most recent call last):
          File "/home/ibis/kinga.balazs/RNA-Seq/STAR_outputs/BAM_new/sorted/dexseq_count.py", line 239, in <module>
            for a in reader( sam_file ):
          File "/home/ibis/kinga.balazs/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
            yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
          File "_HTSeq.pyx", line 1233, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:23869)
          File "pysam/calignedsegment.pyx", line 2077, in pysam.calignedsegment.AlignedSegment.qual.__get__ (pysam/calignedsegment.c:22616)
          File "pysam/cutils.pyx", line 43, in pysam.cutils.array_to_qualitystring (pysam/cutils.c:1845)
        OverflowError: unsigned byte integer is greater than maximum
        I have to mention that I have 36 samples and I get this error only for 4 of them, which is very strange. For the rest it works perfectly. These are not the biggest files, nor were they generated differently. I also checked, if my SAM files contain one of the flags mentioned by Devon Ryan, and no.

        Also for these samples it starts runnig, and for one of them after processing more than 24 million reads I get this error.

        I used STAR as aligner and I also have paired end reads.

        I would be very thankful for any suggestions.


        • #5
          What versions of pysam and Cython do you have installed?


          • #6
            I have pysam, no Cython and Python 2.7.5


            • #7
              My only suggestion would be to try a different pysam version. Perhaps you'll get lucky with that.


              • #8
                After trying several things out (also installing an older pysam version did not solve the problem) I tried also to use the SAM files not the BAMs. And it worked. I really don't know why can this happen. As mentioned before, for 32 samples out of 36, the program works on both SAM and BAM files, and for the rest only when I use the SAM files. I just wanted to let you know, that this was the solution I found.