Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • samhokin
    Member
    • Nov 2013
    • 20

    Tophat2 never completes Generating SAM header

    I think this has been touched upon before, but I haven't been able to find a definitive answer, so here I go. Apologies if previously addressed.

    I'm trying to map single-ended RNA-seq reads against the maize AGPv3 genome, which I've bowtie-build indexed. When I run tophat2 (which I've used plenty), I get the following:

    $ tophat2 -p8 /mnt/data/AGPv3/AGPv3 lane7-index12_CTTGTA_L007_R1.chunk.fastq.gz

    [2014-12-30 17:38:40] Beginning TopHat run (v2.0.13)
    -----------------------------------------------
    [2014-12-30 17:38:40] Checking for Bowtie
    Bowtie version: 2.2.4.0
    [2014-12-30 17:38:40] Checking for Bowtie index files (genome)..
    [2014-12-30 17:38:40] Checking for reference FASTA file
    [2014-12-30 17:38:40] Generating SAM header for /mnt/data/AGPv3/AGPv3

    And that's it. For days. The process is consuming 100% of a single CPU, and I've tried it without -p8 as well (seemed to be a cure in another thread), no change.

    Note that I'm NOT using an annotation GTF; I want to map directly to the DNA without reference to annotated features, and I'm particularly interested in repeats (and may need to use some options for that, but that's not a concern in this post).

    Am I expecting too much for this to finish in five CPU days on a pretty decent Xeon processor? Is there something I can do to generate the SAM header separately? I haven't been able to find anything on this in the Tophat docs or Google searching.

    The issue has nothing to do with the size of the reads file - using a small fastq chunk makes no difference. Tophat seems to be saying it's building a SAM header from the genome files, not dealing with the provided fastq file yet.

    As far as size is concerned, here's the indexed genome files:

    -rw-rw-r--. 1 sam sam 657M Dec 18 20:29 AGPv3.1.bt2
    -rw-rw-r--. 1 sam sam 488M Dec 18 20:29 AGPv3.2.bt2
    -rw-rw-r--. 1 sam sam 1.1M Dec 18 18:50 AGPv3.3.bt2
    -rw-rw-r--. 1 sam sam 488M Dec 18 18:50 AGPv3.4.bt2
    -rw-rw-r--. 1 sam sam 2.0G Dec 27 11:04 AGPv3.fa
    -rw-rw-r--. 1 sam sam 609M Dec 18 22:09 AGPv3.rev.1.bt2
    -rw-rw-r--. 1 sam sam 456M Dec 18 22:09 AGPv3.rev.2.bt2

    Thanks in advance for any pointers!
    Sam Hokin
    Computational Scientist, Carnegie and NCGR
  • samhokin
    Member
    • Nov 2013
    • 20

    #2
    Can anyone help me with this? Now I'm having the same result when I try to map single-fastq reads (multplexed-PE in this case) to the Maize AGPv3 genes. Run-of-the-mill application of TopHat2. Here's the command:

    tophat --library-type=fr-unstranded --no-novel-juncs --transcriptome-only --transcriptome-index=tindex/AGPv3 -o SRR650377 /mnt/data/AGPv3/AGPv3 SRR650377.fastq.gz

    I've already indexed the transcriptome, so I'm not using the -G option. (I was planning on running this against seven different samples.)

    As before, this runs forever:

    sam 23656 23652 0 Dec29 ? 00:00:00 perl /usr/local/bin/bowtie2 -x /mnt/data/AGPv3/AGPv3 /dev/null
    sam 23658 23656 94 Dec29 ? 16:24:08 /usr/local/bin/../src/bowtie2/bowtie2-align-s --wrapper basic-0 -x /mnt/data/AGPv3/AGPv3 /dev/null
    Sam Hokin
    Computational Scientist, Carnegie and NCGR

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, Today, 10:09 AM
    0 responses
    8 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, Yesterday, 08:59 AM
    0 responses
    14 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    22 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 11:40 AM
    0 responses
    19 views
    0 reactions
    Last Post SEQadmin2  
    Working...