I think this has been touched upon before, but I haven't been able to find a definitive answer, so here I go. Apologies if previously addressed.
I'm trying to map single-ended RNA-seq reads against the maize AGPv3 genome, which I've bowtie-build indexed. When I run tophat2 (which I've used plenty), I get the following:
$ tophat2 -p8 /mnt/data/AGPv3/AGPv3 lane7-index12_CTTGTA_L007_R1.chunk.fastq.gz
[2014-12-30 17:38:40] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2014-12-30 17:38:40] Checking for Bowtie
Bowtie version: 2.2.4.0
[2014-12-30 17:38:40] Checking for Bowtie index files (genome)..
[2014-12-30 17:38:40] Checking for reference FASTA file
[2014-12-30 17:38:40] Generating SAM header for /mnt/data/AGPv3/AGPv3
And that's it. For days. The process is consuming 100% of a single CPU, and I've tried it without -p8 as well (seemed to be a cure in another thread), no change.
Note that I'm NOT using an annotation GTF; I want to map directly to the DNA without reference to annotated features, and I'm particularly interested in repeats (and may need to use some options for that, but that's not a concern in this post).
Am I expecting too much for this to finish in five CPU days on a pretty decent Xeon processor? Is there something I can do to generate the SAM header separately? I haven't been able to find anything on this in the Tophat docs or Google searching.
The issue has nothing to do with the size of the reads file - using a small fastq chunk makes no difference. Tophat seems to be saying it's building a SAM header from the genome files, not dealing with the provided fastq file yet.
As far as size is concerned, here's the indexed genome files:
-rw-rw-r--. 1 sam sam 657M Dec 18 20:29 AGPv3.1.bt2
-rw-rw-r--. 1 sam sam 488M Dec 18 20:29 AGPv3.2.bt2
-rw-rw-r--. 1 sam sam 1.1M Dec 18 18:50 AGPv3.3.bt2
-rw-rw-r--. 1 sam sam 488M Dec 18 18:50 AGPv3.4.bt2
-rw-rw-r--. 1 sam sam 2.0G Dec 27 11:04 AGPv3.fa
-rw-rw-r--. 1 sam sam 609M Dec 18 22:09 AGPv3.rev.1.bt2
-rw-rw-r--. 1 sam sam 456M Dec 18 22:09 AGPv3.rev.2.bt2
Thanks in advance for any pointers!
I'm trying to map single-ended RNA-seq reads against the maize AGPv3 genome, which I've bowtie-build indexed. When I run tophat2 (which I've used plenty), I get the following:
$ tophat2 -p8 /mnt/data/AGPv3/AGPv3 lane7-index12_CTTGTA_L007_R1.chunk.fastq.gz
[2014-12-30 17:38:40] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2014-12-30 17:38:40] Checking for Bowtie
Bowtie version: 2.2.4.0
[2014-12-30 17:38:40] Checking for Bowtie index files (genome)..
[2014-12-30 17:38:40] Checking for reference FASTA file
[2014-12-30 17:38:40] Generating SAM header for /mnt/data/AGPv3/AGPv3
And that's it. For days. The process is consuming 100% of a single CPU, and I've tried it without -p8 as well (seemed to be a cure in another thread), no change.
Note that I'm NOT using an annotation GTF; I want to map directly to the DNA without reference to annotated features, and I'm particularly interested in repeats (and may need to use some options for that, but that's not a concern in this post).
Am I expecting too much for this to finish in five CPU days on a pretty decent Xeon processor? Is there something I can do to generate the SAM header separately? I haven't been able to find anything on this in the Tophat docs or Google searching.
The issue has nothing to do with the size of the reads file - using a small fastq chunk makes no difference. Tophat seems to be saying it's building a SAM header from the genome files, not dealing with the provided fastq file yet.
As far as size is concerned, here's the indexed genome files:
-rw-rw-r--. 1 sam sam 657M Dec 18 20:29 AGPv3.1.bt2
-rw-rw-r--. 1 sam sam 488M Dec 18 20:29 AGPv3.2.bt2
-rw-rw-r--. 1 sam sam 1.1M Dec 18 18:50 AGPv3.3.bt2
-rw-rw-r--. 1 sam sam 488M Dec 18 18:50 AGPv3.4.bt2
-rw-rw-r--. 1 sam sam 2.0G Dec 27 11:04 AGPv3.fa
-rw-rw-r--. 1 sam sam 609M Dec 18 22:09 AGPv3.rev.1.bt2
-rw-rw-r--. 1 sam sam 456M Dec 18 22:09 AGPv3.rev.2.bt2
Thanks in advance for any pointers!
Comment