Hi,
I've been dowloaded some Illumina/Solexa short read files from SRA such as this one:
@CCFFFFEFHHGHJJJJJGIIEHGIIJJGIIEGIEBGIJEEFFFFCFFFCDEDDDDDDDDDDDDDDDDBA@CDEEDCDCDDDDCCDCDBDDDDDDCCCDC
@SRR350739.3 SOLEXA4_0073:2:1101:1220:2247 length=100
CGAGATATTCAAGATTTCTCTGCTGCTTGTCAGTTGAGAGCGTTGCGATTGGATTCCCGTTCCTCCCCTTGGCTTCGGCTCAGTTCGTCCTTTAACATCA
+SRR350739.3 SOLEXA4_0073:2:1101:1220:2247 length=100
CCCFFFFFHHHDHHHIFIJIJIJIIIIJGIJIIJJIEIIJJJJIIIJEHGEHHIGIGHJIIIIIJHHHGFFFFFEEDDBDB>CCDCDDBDDDD>@CCC>C
@SRR350739.4 SOLEXA4_0073:2:1101:1320:2116 length=100
CACAAGGTCCCGGAACACACCACGCGTTGCGGCAGCGGCCATGAACACGGACAAAATACTGCGAGCCAATGCTGGGACCAAATCGCTAATCGTTCGCGAA
+SRR350739.4 SOLEXA4_0073:2:1101:1320:2116 length=100
CCCFFFFFHHHHHJJJJJJJJIJJJJJJIJJJJIFHHFFDDDDDDDDDDDBDDDDDDDDDEDDBDDDDDDDDDDDDDDDBBDDDDDDDDDDDDDDDDDDD
and while trying to use the file in maq with the easy run command:
maq.pl easyrun -d outdir ref.fasta reads.fastq
I get this error :
[seq_read_fastq] Inconsistent sequence name: CA>ACDADDDDCCB(<AC@@BBD-<99<?CD. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: BDBDBDDCDDA9A?<BCCCC>B. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: 5ACDDDDDCDDDDD. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: BDCABDDDDDDDDDBDCCCBDDBDBA<. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: CDC. Continue anyway
A quick search around reveals that probably this error is due to spaces in the sequence name. Normally I'd remove the spaces in vim editor but these are huge files which can't even open in vim editor.
Does anyone know an easy way to remove spaces in the sequence name or if I have a different problem other than spaces in sequence name?
Thanks in advance.
I've been dowloaded some Illumina/Solexa short read files from SRA such as this one:
@CCFFFFEFHHGHJJJJJGIIEHGIIJJGIIEGIEBGIJEEFFFFCFFFCDEDDDDDDDDDDDDDDDDBA@CDEEDCDCDDDDCCDCDBDDDDDDCCCDC
@SRR350739.3 SOLEXA4_0073:2:1101:1220:2247 length=100
CGAGATATTCAAGATTTCTCTGCTGCTTGTCAGTTGAGAGCGTTGCGATTGGATTCCCGTTCCTCCCCTTGGCTTCGGCTCAGTTCGTCCTTTAACATCA
+SRR350739.3 SOLEXA4_0073:2:1101:1220:2247 length=100
CCCFFFFFHHHDHHHIFIJIJIJIIIIJGIJIIJJIEIIJJJJIIIJEHGEHHIGIGHJIIIIIJHHHGFFFFFEEDDBDB>CCDCDDBDDDD>@CCC>C
@SRR350739.4 SOLEXA4_0073:2:1101:1320:2116 length=100
CACAAGGTCCCGGAACACACCACGCGTTGCGGCAGCGGCCATGAACACGGACAAAATACTGCGAGCCAATGCTGGGACCAAATCGCTAATCGTTCGCGAA
+SRR350739.4 SOLEXA4_0073:2:1101:1320:2116 length=100
CCCFFFFFHHHHHJJJJJJJJIJJJJJJIJJJJIFHHFFDDDDDDDDDDDBDDDDDDDDDEDDBDDDDDDDDDDDDDDDBBDDDDDDDDDDDDDDDDDDD
and while trying to use the file in maq with the easy run command:
maq.pl easyrun -d outdir ref.fasta reads.fastq
I get this error :
[seq_read_fastq] Inconsistent sequence name: CA>ACDADDDDCCB(<AC@@BBD-<99<?CD. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: BDBDBDDCDDA9A?<BCCCC>B. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: 5ACDDDDDCDDDDD. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: BDCABDDDDDDDDDBDCCCBDDBDBA<. Continue anyway.
[seq_read_fastq] Inconsistent sequence name: CDC. Continue anyway
A quick search around reveals that probably this error is due to spaces in the sequence name. Normally I'd remove the spaces in vim editor but these are huge files which can't even open in vim editor.
Does anyone know an easy way to remove spaces in the sequence name or if I have a different problem other than spaces in sequence name?
Thanks in advance.
Comment