So I'm mapping some RNA-seq using Tophat and bowtie 1 (both latest versions, also latest version of Samtools). I built the genome and transcriptome from GRCh37.72 from Ensembl, using only the main chromosomes (1-22, X, Y, MT).
Just as a test, I made a fastQ out of 2 reads, or tried some other small fastQ from GEO. Somehow, it takes 2 hours to map 2 reads - Tophat stay stuck at "Using pre-built transcriptome data.." for 1h30, then map the reads very quickly. During this time, if I check running processes, there's nothing (Cpu is flat, memory isn't increasing so its not loading anything).
Here's the command I used:
~/tophat-2.0.9.Linux_x86_64/tophat --bowtie1 -p 3 -g 1 --no-novel-juncs --no-novel-indels --transcriptome-index=/index/bowtie/human/transcriptome/GRCh37 -G /index/bowtie/human/Homo_sapiens-GRCh37-72-filtered.gtf -o test3 /index/bowtie/human/Homo_sapiens-GRCh37-72 SRR027888.SRR027890_chr10_1.fastq
Any ideas? I can't believe this is normal behavior. Furthermore, everything is fast when mapping to only the genome, so it seems to be transcriptome specific.
Just as a test, I made a fastQ out of 2 reads, or tried some other small fastQ from GEO. Somehow, it takes 2 hours to map 2 reads - Tophat stay stuck at "Using pre-built transcriptome data.." for 1h30, then map the reads very quickly. During this time, if I check running processes, there's nothing (Cpu is flat, memory isn't increasing so its not loading anything).
Here's the command I used:
~/tophat-2.0.9.Linux_x86_64/tophat --bowtie1 -p 3 -g 1 --no-novel-juncs --no-novel-indels --transcriptome-index=/index/bowtie/human/transcriptome/GRCh37 -G /index/bowtie/human/Homo_sapiens-GRCh37-72-filtered.gtf -o test3 /index/bowtie/human/Homo_sapiens-GRCh37-72 SRR027888.SRR027890_chr10_1.fastq
Any ideas? I can't believe this is normal behavior. Furthermore, everything is fast when mapping to only the genome, so it seems to be transcriptome specific.
Comment