Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • flashton
    Member
    • Feb 2011
    • 10

    Problem using HTSeq count with SAM file without quality score

    Hi everyone,

    I have a problem using HTSeq count to analyse my SAM file. I think I know the source of the error and I'm wondering whether there is a work around.

    The SAM files I'm dealing with don't have any quality scores as they were made by aligning fasta files to the reference using segemehl. They look like

    @HD VN:1.0
    @SQ SN:gi|153930785|ref|NC_009697.1| LN:3863450
    @PG ID:segemehl VN:0.1.3-$Rev: 335 $ ($Date: 2012-03-13 18:55:51 +0100 (Tue, 13 Mar 2012) $) CL:/usr/local/segemehl/segemehl.x -i cbot19397.idx -d Cbot19397.fasta -q /Volumes/DataRAID/Projects/Phil_RNA_SEQ/170112/fasta/14_trimmed.fa -t 6 -E 0.0001
    Noname 0 gi|153930785|ref|NC_009697.1| 1577456 255 25M1I11M * 0 0 TATAAAATATAATAATATATATATATATATATATATA * NM:i:3 MD:Z:19A7T8 NH:i:1 XA:Z:Q
    Noname 0 gi|153930785|ref|NC_009697.1| 3243930 255 49M1D1M * 0 0 ATAGTAACAACAATAATAACAATAACGACAATAACAACAATACTAAACCC * NM:i:1 MD:Z:49^T1 NH:i:1 XA:Z:Q
    Noname 0 gi|153930785|ref|NC_009697.1| 59594 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
    Noname 0 gi|153930785|ref|NC_009697.1| 458627 255 17M1I4M1D28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:18A2^A28 NH:i:7 XA:Z:Q
    Noname 0 gi|153930785|ref|NC_009697.1| 52967 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
    Noname 0 gi|153930785|ref|NC_009697.1| 67539 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
    Noname 0 gi|153930785|ref|NC_009697.1| 73821 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q

    As you can see, there is a * instead of a qual string. When I try to use HTSeq count by inputting

    python -m HTSeq.scripts.count -t CDS -m intersection-nonempty -i Name -a 0 14_trimmed_fasta_v2.sam CP000726.gff

    It returns

    7465 GFF lines processed.
    Error occured when reading first line of sam file.
    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 4 of file 14_trimmed_fasta_v2.sam')
    [Exception type: ValueError, raised in _HTSeq.pyx:765]

    I have tried setting the -a parameter to 0 and * with no positive effect. I think that HTSeq is expecting a quality string which my SAM file doesn't have.

    Does anyone know a way around this problem?

    Thanks,
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Duplicate thread - see http://seqanswers.com/forums/showthread.php?t=18952 where Simon Anders confirmed this was a known issue.

    Comment

    • flashton
      Member
      • Feb 2011
      • 10

      #3
      Thanks Peter, apologies for the duplicate.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      29 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      23 views
      0 reactions
      Last Post SEQadmin2  
      Working...