Hi,
I'm trying to map RNA-Seq reads to a Unigene (EST) database containing 30,000+ scaffolds. I've blasted each scaffold (gene) and annotated them with the blast results, so for example the first sequence is:
>gi|214105|gb|L09728.1|XELDLL4_Xenopus_laevis_putative_transcription_factor_DLL4_mRNA,_complete_cds
GGCGGTCGTGAGCGATTACTCCCCCTGAGCTTGTGTAGCGACCCAACCCACCAGCTGCGGAGAACATGCGTCCAGCGTCCTCCCACCGCCCGGCCCGTCGCTCCTGAT[...........]
I've tried sorting using samtools sort on tophat output, as well as manually using sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
However both fail cufflinks as the files are not correctly sorted. I assume that cuffflinks needs a numerical identifier in the header, so have re run tophat and cufflinks with only numerical headers (which works fine).
I want to do DE in cummerbund, and seeing as I dont have an annotation file, it's important that I carry over the headers, rather than just numbers. Is there any way of doing this?
I assume that the lack of spaces (I've replaced with _ to enable the entire header to be visible downstream) is one of the problems...
Any suggestions would be very welcome!
Best,
Nick
I'm trying to map RNA-Seq reads to a Unigene (EST) database containing 30,000+ scaffolds. I've blasted each scaffold (gene) and annotated them with the blast results, so for example the first sequence is:
>gi|214105|gb|L09728.1|XELDLL4_Xenopus_laevis_putative_transcription_factor_DLL4_mRNA,_complete_cds
GGCGGTCGTGAGCGATTACTCCCCCTGAGCTTGTGTAGCGACCCAACCCACCAGCTGCGGAGAACATGCGTCCAGCGTCCTCCCACCGCCCGGCCCGTCGCTCCTGAT[...........]
I've tried sorting using samtools sort on tophat output, as well as manually using sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
However both fail cufflinks as the files are not correctly sorted. I assume that cuffflinks needs a numerical identifier in the header, so have re run tophat and cufflinks with only numerical headers (which works fine).
I want to do DE in cummerbund, and seeing as I dont have an annotation file, it's important that I carry over the headers, rather than just numbers. Is there any way of doing this?
I assume that the lack of spaces (I've replaced with _ to enable the entire header to be visible downstream) is one of the problems...
Any suggestions would be very welcome!
Best,
Nick