OK. I have rna-seq data from 3 mice. I want to align this data to the mouse reference -- either genome, which seems like overkill, or to all the known mRNAs from Acembly -- which I have in FASTA format.
The data came from 454 as FASTA (.fna) and .qual files.
I've also obtained the SFF files.
TopHat didn't seem to like the FASTQ files I generated from the FASTA and .qual files using BioPython's SeqIO functions.
There's a really big range of read lengths in this data, which also seems to make a lot of the tools (since they were engineered for Illumina GA output) unhappy.
Thoughts? Advice? I'm on a Mac Pro, but have 64-bit linux and Windows XP running via virtualization.
Thanks,
Anand
The data came from 454 as FASTA (.fna) and .qual files.
I've also obtained the SFF files.
TopHat didn't seem to like the FASTQ files I generated from the FASTA and .qual files using BioPython's SeqIO functions.
There's a really big range of read lengths in this data, which also seems to make a lot of the tools (since they were engineered for Illumina GA output) unhappy.
Thoughts? Advice? I'm on a Mac Pro, but have 64-bit linux and Windows XP running via virtualization.
Thanks,
Anand