Hi everyone,
I have a problem using HTSeq count to analyse my SAM file. I think I know the source of the error and I'm wondering whether there is a work around.
The SAM files I'm dealing with don't have any quality scores as they were made by aligning fasta files to the reference using segemehl. They look like
@HD VN:1.0
@SQ SN:gi|153930785|ref|NC_009697.1| LN:3863450
@PG ID:segemehl VN:0.1.3-$Rev: 335 $ ($Date: 2012-03-13 18:55:51 +0100 (Tue, 13 Mar 2012) $) CL:/usr/local/segemehl/segemehl.x -i cbot19397.idx -d Cbot19397.fasta -q /Volumes/DataRAID/Projects/Phil_RNA_SEQ/170112/fasta/14_trimmed.fa -t 6 -E 0.0001
Noname 0 gi|153930785|ref|NC_009697.1| 1577456 255 25M1I11M * 0 0 TATAAAATATAATAATATATATATATATATATATATA * NM:i:3 MD:Z:19A7T8 NH:i:1 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 3243930 255 49M1D1M * 0 0 ATAGTAACAACAATAATAACAATAACGACAATAACAACAATACTAAACCC * NM:i:1 MD:Z:49^T1 NH:i:1 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 59594 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 458627 255 17M1I4M1D28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:18A2^A28 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 52967 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 67539 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 73821 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
As you can see, there is a * instead of a qual string. When I try to use HTSeq count by inputting
python -m HTSeq.scripts.count -t CDS -m intersection-nonempty -i Name -a 0 14_trimmed_fasta_v2.sam CP000726.gff
It returns
7465 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("'seq' and 'qualstr' do not have the same length.", 'line 4 of file 14_trimmed_fasta_v2.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:765]
I have tried setting the -a parameter to 0 and * with no positive effect. I think that HTSeq is expecting a quality string which my SAM file doesn't have.
Does anyone know a way around this problem?
Thanks,
I have a problem using HTSeq count to analyse my SAM file. I think I know the source of the error and I'm wondering whether there is a work around.
The SAM files I'm dealing with don't have any quality scores as they were made by aligning fasta files to the reference using segemehl. They look like
@HD VN:1.0
@SQ SN:gi|153930785|ref|NC_009697.1| LN:3863450
@PG ID:segemehl VN:0.1.3-$Rev: 335 $ ($Date: 2012-03-13 18:55:51 +0100 (Tue, 13 Mar 2012) $) CL:/usr/local/segemehl/segemehl.x -i cbot19397.idx -d Cbot19397.fasta -q /Volumes/DataRAID/Projects/Phil_RNA_SEQ/170112/fasta/14_trimmed.fa -t 6 -E 0.0001
Noname 0 gi|153930785|ref|NC_009697.1| 1577456 255 25M1I11M * 0 0 TATAAAATATAATAATATATATATATATATATATATA * NM:i:3 MD:Z:19A7T8 NH:i:1 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 3243930 255 49M1D1M * 0 0 ATAGTAACAACAATAATAACAATAACGACAATAACAACAATACTAAACCC * NM:i:1 MD:Z:49^T1 NH:i:1 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 59594 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 458627 255 17M1I4M1D28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:18A2^A28 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 52967 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 67539 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
Noname 0 gi|153930785|ref|NC_009697.1| 73821 255 17M1D4M1I28M * 0 0 CAAGATGAGATTTCCCAATCGCTAAGCTAGTAAGACTCCTGGAAGAACAC * NM:i:3 MD:Z:17^T1G30 NH:i:7 XA:Z:Q
As you can see, there is a * instead of a qual string. When I try to use HTSeq count by inputting
python -m HTSeq.scripts.count -t CDS -m intersection-nonempty -i Name -a 0 14_trimmed_fasta_v2.sam CP000726.gff
It returns
7465 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("'seq' and 'qualstr' do not have the same length.", 'line 4 of file 14_trimmed_fasta_v2.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:765]
I have tried setting the -a parameter to 0 and * with no positive effect. I think that HTSeq is expecting a quality string which my SAM file doesn't have.
Does anyone know a way around this problem?
Thanks,
Comment