Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error with paired-end data

    Hi,

    I'm trying to run rhe python script for counting the reads for a dexseq analysis.
    But I keep getting a strange error I can't understand

    Code:
    python ~/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py -p yes -s no MmusGRCm38.DEXSeq.gff -f bam -r pos ../STARmapping_allSamples/C23/C23.STAR.sorted.bam C23.DEXSeq.paired.unstranded.txt
    and the error is:
    Code:
    Traceback (most recent call last):
      File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py", line 236, in <module>
        for a in reader( sam_file ):
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
        yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
      File "_HTSeq.pyx", line 1247, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:24235)
      File "csamtools.pyx", line 2308, in csamtools.AlignedRead.tags.__get__ (lib/pysam/csamtools.c:19977)
    OverflowError: unsigned byte integer is less than minimum
    The bam files were created by the STAR aligner and multiple mapping was allowed.

    here is a sample of one of the bam files. I attached the first 1000 lines of the bam files as a file to test.

    Code:
    samtools view ../STARmapping_allSamples/C23/C23.STAR.sorted.bam | head
    HISEQ:244:C492NACXX:2:1312:7663:90531   99      10      3138670 255     86M15S  =       3138670 86      
    HISEQ:244:C492NACXX:2:1312:7663:90531   147     10      3138670 255     15S86M  =       3138670 -86     
    HISEQ:244:C492NACXX:2:1314:12970:38730  99      10      3140537 255     90M11S  =       3140537 90      
    HISEQ:244:C492NACXX:2:1314:12970:38730  147     10      3140537 255     11S90M  =       3140537 -90     
    HISEQ:244:C492NACXX:2:2110:4692:21500   99      10      3147218 3       101M    =       3147234 117     
    HISEQ:244:C492NACXX:2:2110:4692:21500   147     10      3147234 3       101M    =       3147218 -117    
    HISEQ:244:C492NACXX:2:2110:14708:71286  99      10      3199864 3       101M    =       3199882 119     
    HISEQ:244:C492NACXX:2:2110:14708:71286  147     10      3199882 3       101M    =       3199864 -119    
    HISEQ:244:C492NACXX:2:1316:5341:98737   419     10      3238883 0       101M    =       3238903 121     
    HISEQ:244:C492NACXX:2:1316:5341:98737   339     10      3238903 0       101M    =       3238883 -121
    Attached Files

  • #2
    My only guess is that it things like jM:B:c,-1 aren't supported by htseq-count. Maybe you can remove them with awk?

    Comment


    • #3
      yes, I thought this would be a problem. I have removed the "unusual" falgs and it works perfectly.

      thanks again

      Comment


      • #4
        Dear all,

        I have a very similar problem as frymor, when using the dexseq_count.py script.

        However, the error I get is this:

        Code:
        Traceback (most recent call last):
          File "/home/ibis/kinga.balazs/RNA-Seq/STAR_outputs/BAM_new/sorted/dexseq_count.py", line 239, in <module>
            for a in reader( sam_file ):
          File "/home/ibis/kinga.balazs/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
            yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
          File "_HTSeq.pyx", line 1233, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:23869)
          File "pysam/calignedsegment.pyx", line 2077, in pysam.calignedsegment.AlignedSegment.qual.__get__ (pysam/calignedsegment.c:22616)
          File "pysam/cutils.pyx", line 43, in pysam.cutils.array_to_qualitystring (pysam/cutils.c:1845)
        OverflowError: unsigned byte integer is greater than maximum
        I have to mention that I have 36 samples and I get this error only for 4 of them, which is very strange. For the rest it works perfectly. These are not the biggest files, nor were they generated differently. I also checked, if my SAM files contain one of the flags mentioned by Devon Ryan, and no.

        Also for these samples it starts runnig, and for one of them after processing more than 24 million reads I get this error.

        I used STAR as aligner and I also have paired end reads.

        I would be very thankful for any suggestions.

        Comment


        • #5
          What versions of pysam and Cython do you have installed?

          Comment


          • #6
            I have pysam 0.9.1.4, no Cython and Python 2.7.5

            Comment


            • #7
              My only suggestion would be to try a different pysam version. Perhaps you'll get lucky with that.

              Comment


              • #8
                After trying several things out (also installing an older pysam version did not solve the problem) I tried also to use the SAM files not the BAMs. And it worked. I really don't know why can this happen. As mentioned before, for 32 samples out of 36, the program works on both SAM and BAM files, and for the rest only when I use the SAM files. I just wanted to let you know, that this was the solution I found.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  Today, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 08:18 AM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Today, 08:04 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-03-2024, 06:55 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-30-2024, 03:16 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Working...
                X