Hi,
I have some RNA-Seq from the fruit fly, where we test the differential gene expression of several knock-outs and developmental stages.
As I heard a lot of good things about the STAR algorithm, I tested it in comparison to bbmap and tophat2.
These are the commands I used in the three algorithms:
In all my runs ( I testes 12 different samples), STAR performed better than topohat2, while bbmap was way back with the number of mapped reads.
When I took a closer look at the mapped bam files and compared tophat2 and STAR I have noticed the STAR has some problems with long exons. Even though it shows almost everywhere a higher mapping rate, in long exons it seems not to be able to map the reads correctly, while tophat2 can map there quite a lot of them.
What might be the reasons (if any), that STAR doesn't map these regions?
I have here two examples of such exons.
and
another difference I can see here is the behaviour around splice junctions, tophat can identify splice junction over a much longer region. Some of them we know to be true, but STAR can't see them. But this is another topic
I have some RNA-Seq from the fruit fly, where we test the differential gene expression of several knock-outs and developmental stages.
As I heard a lot of good things about the STAR algorithm, I tested it in comparison to bbmap and tophat2.
These are the commands I used in the three algorithms:
Code:
~/software/STAR-STAR_2.4.1c/STAR --runThreadN 15 --genomeDir genomes/Drosophila_melanogaster/STARindex/Dmel/ --readFilesIn $file --readFilesCommand zcat --sjdbGTFfile genes.gtf --outFilterType BySJout --outFilterMultimapNmax 1 --alignSJoverhangMin 8 --outFileNamePrefix $NEW_FILE.STAR. --outSAMtype BAM Unsorted --outReadsUnmapped Fastx --outFilterMismatchNoverLmax 0.05 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --alignIntronMax 1 bbmap.sh in=$file outu=$NEW_FILE.bbmap.unmapped.bam outm=$NEW_FILE.bbmap.bam bamscript=$NEW_FILE.bbmap.sh minidentity=0.9 ambiguous=toss kfilter=85 bhist=bhist.$NEW_FILE.bbmap.txt 2>$NEW_FILE.bbmap_stat tophat2 -p 15 -g 1 -G genes.gtf -o $NEW_FILE.tophat.out genomes/Drosophila_melanogaster/Ensembl/BDGP6.80/bowtie2index/genome $file
When I took a closer look at the mapped bam files and compared tophat2 and STAR I have noticed the STAR has some problems with long exons. Even though it shows almost everywhere a higher mapping rate, in long exons it seems not to be able to map the reads correctly, while tophat2 can map there quite a lot of them.
What might be the reasons (if any), that STAR doesn't map these regions?
I have here two examples of such exons.
and
another difference I can see here is the behaviour around splice junctions, tophat can identify splice junction over a much longer region. Some of them we know to be true, but STAR can't see them. But this is another topic
Comment