I'm trying to identify splice sites in some RNA-seq data I got off the Short Read Archive (through NCBI; extracted the .sra files to .fastq using their provided utility). Every one I run through either TopHat or SOAPsplice returns empty junction files, though.
I've tried using datasets with longer read lengths or paired-end sequences, and it always comes out empty. When I input a manually constructed list of known splice junctions, TopHat was able to ID them, but SOAPsplice still returned an empty file (and since the link to their sample data is broken, I can't even test that).
Does anyone know what could be going wrong? It seems unlikely that an entire Illumina read would capture not a single splicing event, and looking at the other files (eg, SOAPsplice's .2seq output) it seems that they're present, but the programs aren't outputting them properly. I could write a script to extract them, but then I'd lose some of the quality filtering.
Any ideas/suggestions? Thanks in advance.
PS: I'm running SOAPsplice with the default options; TopHat is being handled through the iPlant Discovery Environment, so I assume the options are pretty close to default.
I've tried using datasets with longer read lengths or paired-end sequences, and it always comes out empty. When I input a manually constructed list of known splice junctions, TopHat was able to ID them, but SOAPsplice still returned an empty file (and since the link to their sample data is broken, I can't even test that).
Does anyone know what could be going wrong? It seems unlikely that an entire Illumina read would capture not a single splicing event, and looking at the other files (eg, SOAPsplice's .2seq output) it seems that they're present, but the programs aren't outputting them properly. I could write a script to extract them, but then I'd lose some of the quality filtering.
Any ideas/suggestions? Thanks in advance.
PS: I'm running SOAPsplice with the default options; TopHat is being handled through the iPlant Discovery Environment, so I assume the options are pretty close to default.