I ran into the "sequence and quality are inconsistent" error when trying to use samtools view -Sb on some bwa aln sam files. I read in a previous thread http://seqanswers.com/forums/showthread.php?t=17353
that someone else who had this problem found it was due to the -I parameter used during the bwa aln > sai step. I also used -I as I have Illumina data and thought this was appropriate - maybe I was wrong. This is the only reason I have been able to find for this error (excluding file size problems -- and my sai files match in size and are of the size I would expect). Below are the commands I used and the line from the sam file that threw the error message. So, 2 questions:
1) Is this an ASCII quality score issue?
2) If this is in fact an ASCII quality score issue, is there a tool out there to convert the scores from within the sam file so I can avoid re-running all 10 of my samples (which run for over 6 hours each)?
Thanks in advance!!
**Preprocessing on fastq files with cutadapt and sickle**
bwa aln -I -t 8 genome.fa sickle_Sample_TM_E_R1.fastq >
bwa aln -I -t 8 genome.fa sickle_Sample_TM_E_R2.fastq >
bwa sampe genome.fa TME-vs-XXX.1.sai TME-vs-XXX.2.sai sickle_Sample_TM_E_R1.fastq sickle_Sample_TM_E_R2.fastq > bwa-TME-XXX.sam
HWI-ST808:15527N9ACXX:8:1101:1177:2099 69 scaffold17748 58833 0 * = 58833 0 TCGCATGCCCGCCAGCGCCTGTCGGGGCTGTCGCGGCAGATTTGCCGCAGGGCACCGATCCCGAAGCGGATTCGCTGCGCATCANCAGCTCCTCACCCGNN $$$'''%&))'')(*+(+++****((*((******)'%% $%$$$## !##%%%%% !##%#%#%%%#%%%#!%" $%$$"#
that someone else who had this problem found it was due to the -I parameter used during the bwa aln > sai step. I also used -I as I have Illumina data and thought this was appropriate - maybe I was wrong. This is the only reason I have been able to find for this error (excluding file size problems -- and my sai files match in size and are of the size I would expect). Below are the commands I used and the line from the sam file that threw the error message. So, 2 questions:
1) Is this an ASCII quality score issue?
2) If this is in fact an ASCII quality score issue, is there a tool out there to convert the scores from within the sam file so I can avoid re-running all 10 of my samples (which run for over 6 hours each)?
Thanks in advance!!
**Preprocessing on fastq files with cutadapt and sickle**
bwa aln -I -t 8 genome.fa sickle_Sample_TM_E_R1.fastq >
bwa aln -I -t 8 genome.fa sickle_Sample_TM_E_R2.fastq >
bwa sampe genome.fa TME-vs-XXX.1.sai TME-vs-XXX.2.sai sickle_Sample_TM_E_R1.fastq sickle_Sample_TM_E_R2.fastq > bwa-TME-XXX.sam
HWI-ST808:15527N9ACXX:8:1101:1177:2099 69 scaffold17748 58833 0 * = 58833 0 TCGCATGCCCGCCAGCGCCTGTCGGGGCTGTCGCGGCAGATTTGCCGCAGGGCACCGATCCCGAAGCGGATTCGCTGCGCATCANCAGCTCCTCACCCGNN $$$'''%&))'')(*+(+++****((*((******)'%% $%$$$## !##%%%%% !##%#%#%%%#%%%#!%" $%$$"#