Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bassu
    Junior Member
    • Jun 2010
    • 5

    Tophat Run time

    Dear all,

    I am currently running tophat to align human genome reference(hg19 bowtie index) to illumina pair-end data of size 17Gb each(17*2). Its almost 24 houurs and its still running. I wonder how much more time it would take to finish the process? . Concurrently i am also running Maq for 75bp single read of size 15GB with a reference size of 30mb(binary human genome file).

    My system configuration is 64Gb Ram with 8 core processor(Which i feel is one among best configuration available in industry. Do i need to update my system configuration for NGS data analysis?If so please provide me the config).

    And i also like to know how much processing time it would take if i run the Tophat and Maq separately?

    Hopping for a speedy reply asap.

    Thanks
  • Rao
    Member
    • Oct 2008
    • 36

    #2
    try with bowtie...
    is it RNA-seq data?

    Comment

    • lmf_bill
      Member
      • Jul 2008
      • 36

      #3
      Using tophat, especial for paired-end reads, it will take long time. In my personal point, your configuration is enough.
      BTW, tophat will produce huge tmp file.

      Comment

      • john_mu
        Member
        • May 2010
        • 88

        #4
        Did you run TopHat with multiple threads?

        If you are running it with only one thread 17Gb of reads will take several days to run...

        What is your read length? Long reads also take much longer than short ones (for the same amount of data).

        Running TopHat and Maq at the same time should not cause much problems (Unless you ran TopHat with 8 threads)

        Regarding your system configuration, 64Gb should be plenty of RAM for your amount of data.
        SpliceMap: De novo detection of splice junctions from RNA-seq
        Download SpliceMap Comment here

        Comment

        • bassu
          Junior Member
          • Jun 2010
          • 5

          #5
          Thanks all,
          @Rao: yes i am using RNA-seq data.

          @mf_bill: thanks for your valuable comment, I was wondering whether my system configuration was right ? Even though BWT, tophat will produce huge tmp files.. it will get deleted automatically right?

          @john_mu: Thanks john, currently i'm running my reads in a single thread. and my read length is 50bp.

          Comment

          • DineshCyanam
            Compendia Bio
            • Oct 2010
            • 35

            #6
            @Bassu: So how long did it take for you to finally finish the run? Were u able to reduce the run time in any way? I am having the same problem here...

            Comment

            • mrawlins
              Member
              • Apr 2010
              • 63

              #7
              Running with n-1 threads on an n-core machine (e.g. 7 threads on a machine with 8 cores) should speed things up. Bowtie has a --shmem option for using shared memory for all threads, so that shouldn't increase the memory footprint by much to use that many threads. I've observed roughly a linear speed up with the number of cores dedicated to bowtie; I suspect similar results for tophat.
              I will sometimes run with as many threads as cores, but only if I don't intend to use the computer for anything else while the program runs (i.e. a compute node on our analysis cluster).

              Comment

              • crazyhottommy
                Senior Member
                • Apr 2012
                • 187

                #8
                I was running Tophat to map a 24G single end RNA-seq fastq to hg19 with the gtf from GENECODE
                I run it in a cluster with 1 node, 8 processors, ram=3gb
                it took me 45hrs to finish.....

                Any way to speed it up? as a regular user of the cluster, the above setting is the max resource I can have.

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Originally posted by crazyhottommy View Post
                  I was running Tophat to map a 24G single end RNA-seq fastq to hg19 with the gtf from GENECODE
                  I run it in a cluster with 1 node, 8 processors, ram=3gb
                  it took me 45hrs to finish.....

                  Any way to speed it up? as a regular user of the cluster, the above setting is the max resource I can have.
                  Split the fastq file and use multiple nodes. If you had more RAM, you could run STAR.

                  Comment

                  • shi
                    Wei Shi
                    • Feb 2010
                    • 236

                    #10
                    You may try Subread, which is >10 times faster.

                    Comment

                    • crazyhottommy
                      Senior Member
                      • Apr 2012
                      • 187

                      #11
                      Originally posted by shi View Post
                      You may try Subread, which is >10 times faster.
                      I will give it a shot. Thanks

                      Comment

                      • adamyao
                        Member
                        • Feb 2011
                        • 18

                        #12
                        20.8G bases (104M reads, Pair End) 1 node (16 cores) AMD 2.4GHz 96G ram
                        - 28 hours
                        Tophat 2 runs best with 16 cores ( single node) according to our tests otherwise it takes longer.
                        STAR runs much faster ( less than 40 minutes) but needs a lot more memory (64 cores 128G ram).

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM
                        • SEQadmin2
                          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                          by SEQadmin2

                          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                          05-06-2026, 09:04 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Today, 08:59 AM
                        0 responses
                        9 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        21 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 11:40 AM
                        0 responses
                        17 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-28-2026, 11:40 AM
                        0 responses
                        30 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...