Hello everyone,
I'm trying to look at some old RNA-seq data that I was able to find on NCBI. The data is available as a bowtie output, and I'm trying to use tophat2 to get transcript data.
It originally looked something like this:
HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
Tophat gave the following error:
Traceback (most recent call last):
File "/opt/local/bin/tophat", line 2346, in <module>
sys.exit(main())
File "/opt/local/bin/tophat", line 2251, in main
params.read_params = check_reads(params.read_params, reads_list)
File "/opt/local/bin/tophat", line 1063, in check_reads
if first_line[0] in "@>":
IndexError: string index out of range
So I figured it must be the lack of an '@' at the beginning of the name of the reads, so I used vim to add an @ to the beginning of every line:
@HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
@HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
Now when I run this, I get the following error, where '###' is the file path, sorry wanted to keep that private :
Error encountered parsing file /#############:
Premature end of file (missing quality values for HWI-EAS283:1:1:4:1142#0/1 - chr70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N)
This is the very first line, so it seems to hint at a format error...
I looked at the bowtie manual, and it seems my output differs in one way from the manual's: column 5 should be "read sequence", a '+' or '-' value, not the quality score - meaning in between the read and the quality of my output should be another column with a '+' or '-' value.
Am I missing something here? The bowtie output that I'm downloading looks "mostly" like a bowtie output, but it appears wrong... I tried to see if it was maybe an older format, but I can't find any info on that.
Can anybody help me out?
Thanks in advance!!
-worm_picker
I'm trying to look at some old RNA-seq data that I was able to find on NCBI. The data is available as a bowtie output, and I'm trying to use tophat2 to get transcript data.
It originally looked something like this:
HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
Tophat gave the following error:
Traceback (most recent call last):
File "/opt/local/bin/tophat", line 2346, in <module>
sys.exit(main())
File "/opt/local/bin/tophat", line 2251, in main
params.read_params = check_reads(params.read_params, reads_list)
File "/opt/local/bin/tophat", line 1063, in check_reads
if first_line[0] in "@>":
IndexError: string index out of range
So I figured it must be the lack of an '@' at the beginning of the name of the reads, so I used vim to add an @ to the beginning of every line:
@HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
@HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
Now when I run this, I get the following error, where '###' is the file path, sorry wanted to keep that private :
Error encountered parsing file /#############:
Premature end of file (missing quality values for HWI-EAS283:1:1:4:1142#0/1 - chr70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N)
This is the very first line, so it seems to hint at a format error...
I looked at the bowtie manual, and it seems my output differs in one way from the manual's: column 5 should be "read sequence", a '+' or '-' value, not the quality score - meaning in between the read and the quality of my output should be another column with a '+' or '-' value.
Am I missing something here? The bowtie output that I'm downloading looks "mostly" like a bowtie output, but it appears wrong... I tried to see if it was maybe an older format, but I can't find any info on that.
Can anybody help me out?
Thanks in advance!!
-worm_picker
Comment