Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kamsen
    Junior Member
    • Mar 2012
    • 3

    HTSeq dealing with "*" qualities

    Hi everyone,

    I started using HTSeq a couple of days ago and now encountered a problem. Maybe someone knows a workaround.

    I am interating over an sam file and cant find a solution for the error:
    (also described here http://seqanswers.com/forums/showthread.php?t=12091)

    ValueError: 'seq' and 'qualstr' do not have the same length.
    The Alignment is from Bowtie2 and lacks the qualitystring (only a "*" is in the file, but the complete read sequence is there).

    Like: blaaaaa ACTACTATCTAC * blaaaaa


    Since I have a lot of files I cant perform a filtering in the first place, because I do not want to touch those big files twice.

    thanks in advance.

    EDIT:
    I am using the latest release of HTSeq.



    regards
    Last edited by kamsen; 04-04-2012, 07:06 AM.
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Sounds like a bug in HTSeq - as discussed in the linked thread, the SAM/BAM file format explicitly allows the sequencing qualities to be omitted (which in SAM is represented with the * character).

    Have you contacted the HTSeq authors?

    P.S. Saying you use the latest version isn't as helpful as saying the actual version you are using. People may read this thread later on

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #3
      Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.

      Comment

      • kamsen
        Junior Member
        • Mar 2012
        • 3

        #4
        Just a few remarks to close this topic:

        1) I was talking about version 0.5.3p3
        2) I made quick & dirty workaround in the code (__init__ modul l. 537) which worked for me. If somebody encounters this problem one could easily just return the line from the .sam file and create 0 qualities / read the original ones. After that the conversion to the Alignment format will work again.
        3) Thanks anyway for your nice package Simon!

        regards

        Comment

        • NicoBxl
          not just another member
          • Aug 2010
          • 264

          #5
          Originally posted by Simon Anders View Post
          Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.
          Hi Simon,

          Do you fix this bug ? I've the same problem with tophat 2.0.0 bam files.

          Code:
          samtools view -h -o out.sam in.bam
          htseq-count out.sam annotation.gtf > htseq_out.txt
          gives me

          Code:
          100000 GFF lines processed.
          200000 GFF lines processed.
          283699 GFF lines processed.
          Error occured in line 36 of file out.sam.
          Error: ("'seq' and 'qualstr' do not have the same length.", 'line 36 of file out.sam')
          [Exception type: ValueError, raised in _HTSeq.pyx:765]

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            I've just fixed this. In HTSeq 0.5.3p4, SAM files with "*" in the quality field are accepted. Sorry that this took a while.

            Comment

            • NicoBxl
              not just another member
              • Aug 2010
              • 264

              #7
              Thanks Simon, it worked great.

              Comment

              • fishinabarrel
                Junior Member
                • Apr 2011
                • 6

                #8
                Dear Simon,

                You are my hero.
                Just ran into this problem yesterday. And by this morning a solution was already in place.
                I owe you a beer.

                Comment

                • fishinabarrel
                  Junior Member
                  • Apr 2011
                  • 6

                  #9
                  I should also add that I installed HTSeq-0.5.3p3 to encounter the qual problem and upon installing HTSeq-0.5.3p4, all was well.

                  Comment

                  • dharan
                    Junior Member
                    • Jan 2012
                    • 7

                    #10
                    Problem is still seen in HTSeq - v0.5.3p5

                    Dear Simon,

                    I had installed the latest version of HTseq (HTSeq-0.5.3p5.tar.gz) to solve the problem but it looks like for me the error still persists.

                    I am still facing this error:
                    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 2671032 of file ..)
                    [Exception type: ValueError, raised in _HTSeq.pyx:765]

                    Can you please help me out?

                    Thanks,
                    Dharanya

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      It would be nice if the HTSeq error message included the two unmatched lengths - but can you show us what line 2671032 of your input file is? This may not be due to the * for missing qualities at all, but a real error in the data.

                      Comment

                      • dharan
                        Junior Member
                        • Jan 2012
                        • 7

                        #12
                        Hi,

                        Here is the line from that file:


                        HWI-ST790:1:1101:1261:140607#ACTTGA 329 contig_126150 342 3 100M * 0 0 GTCCAGGTTGGTGGACCTCTCAATCATGTTGTCACCCTCAAACCCAGAGATGGGGACGAAGGGAACCTTGTTAGGGTTGTAGCCGACCTTCTTCAGGTAG * AS:i:-7 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A67C28 YT:Z:UU NH:i:2 CC:Z:contig_223383 CP:i:208 HI:i:0

                        Cheers,
                        Dharanya

                        Comment

                        • maubp
                          Peter (Biopython etc)
                          • Jul 2009
                          • 1544

                          #13
                          Can you double check which HTSeq you are using? Perhaps an older copy is taking precedence in your PATH, or the update didn't install properly.

                          Comment

                          • dharan
                            Junior Member
                            • Jan 2012
                            • 7

                            #14
                            May be there might be a problem with the installation. I will go through it again and let you know if there are any problems still.
                            Thanks

                            Comment

                            • chadn737
                              Senior Member
                              • Jan 2009
                              • 392

                              #15
                              As an aside, the latest version of Tophat 2 no longer has the "*" qualities problems.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...