I have some RNA-Seq from the fruit fly, where we test the differential gene expression of several knock-outs and developmental stages.
As I heard a lot of good things about the STAR algorithm, I tested it in comparison to bbmap and tophat2.
These are the commands I used in the three algorithms:
In all my runs ( I testes 12 different samples), STAR performed better than topohat2, while bbmap was way back with the number of mapped reads.
When I took a closer look at the mapped bam files and compared tophat2 and STAR I have noticed the STAR has some problems with long exons. Even though it shows almost everywhere a higher mapping rate, in long exons it seems not to be able to map the reads correctly, while tophat2 can map there quite a lot of them.
What might be the reasons (if any), that STAR doesn't map these regions?
I have here two examples of such exons.
another difference I can see here is the behaviour around splice junctions, tophat can identify splice junction over a much longer region. Some of them we know to be true, but STAR can't see them. But this is another topic
