Hello all,
I'm attempting to run Tophat on SOLiD data from an SRA file and running into problems with the fastq file formatting.
After running fastq-dump on the SRA file, I get the following format:
Executing Tophat like this:
tophat -C -o output --bowtie1 ColorIndex SRR.fastq
Results in the following error:
I researched this error and found that the problem may be I need to use the --quals option and provide a separate quality file. So, I split the fastq file into two separate files:
And ran:
tophat -C --quals -o output --bowtie1 ColorIndex SRR.fastq SRR_qual.fastq
That generates the following error:
I can't find any information on how to properly format the base and quality files when they are separated so that Tophat can read them. Is this my problem? Or something else?
<EDIT>
I properly formatted the two split files into proper FASTA:
But now get the following error:
I'm attempting to run Tophat on SOLiD data from an SRA file and running into problems with the fastq file formatting.
After running fastq-dump on the SRA file, I get the following format:
@SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
T000002201013000130000000.01...20...2....2.....2...
+SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
!+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
T000002201013000130000000.01...20...2....2.....2...
+SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
!+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
tophat -C -o output --bowtie1 ColorIndex SRR.fastq
Results in the following error:
Error running bowtie:
Too few quality values for read: 2899T33
are you sure this is a FASTQ-int file?
Too few quality values for read: 2899T33
are you sure this is a FASTQ-int file?
@SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
T000002201013000130000000.01...20...2....2.....2...
T000002201013000130000000.01...20...2....2.....2...
+SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
!+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
!+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
tophat -C --quals -o output --bowtie1 ColorIndex SRR.fastq SRR_qual.fastq
That generates the following error:
Error encountered parsing file SRR.fastq:
Premature end of file (missing quality values for SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50)
Premature end of file (missing quality values for SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50)
<EDIT>
I properly formatted the two split files into proper FASTA:
>SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
T000002201013000130000000.01...20...2....2.....2...
T000002201013000130000000.01...20...2....2.....2...
>SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
!+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
!+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
Error running 'prep_reads'
Error: beginning of quality values record not found! (!'/,<&.&&*'%1*%.2(%&20%'&!')!!!%&!!!1!!!!1!!!!!%!!!)
Error: beginning of quality values record not found! (!'/,<&.&&*'%1*%.2(%&20%'&!')!!!%&!!!1!!!!1!!!!!%!!!)
Comment