Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimizing tophat running time

    Hi friends,

    I'm trying to align 50bp paired-end Illumina reads to the mm10 genome/transcriptome with tophat 2.0.8. I've done a few runs on our local desktop Mac Pros to get an idea of how the software is working, and now I'm starting to migrate this onto our local high performance computing cluster in the hopes of running the alignments faster (or at least running them in parallel rather than sequentially) as these are large data sets. I'm wondering if anyone has advice on how much computational resources I should request per data set to get them to run quickly, without hogging the system, given that not every step in the tophat pipeline is multi-threaded?

    From my pilot alignments and from reading these forums, I understand that the segment_juncs step is single-threaded and time-consuming -- will this step run more quickly if more memory is available to it? (i.e., Is it "fair" to request more cores from the system just to have their memory? Does the speed of this step scale at all? In my pilots, the run time has been quite variable, and I haven't been able to correlate it with anything obvious.)

    Empirically I've also seen that setting the -p value to less than the actual number of cores available is also necessary to avoid problems when the tophat script shell tries to invoke samtools or other processes while running the alignment and output steps, but it is not totally clear to me what a good value for this should be to avoid problems. Is there any good rule of thumb here, like p="number of cores available" - "some particular constant that I don't know"?

    Thanks in advance for any advice you can provide. The computational side of this is pretty intimidating to a bench biologist, and I've tried to RTFM as best I can understand it, I swear I have!

  • #2
    If you want speed, try STAR. It's much more faster than tophat (and the results seems even better). The only thing is that STAR use much RAM than tophat .



    If you want to use tophat, use not to much process. I think a maximum of 10 is ok.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      Yesterday, 01:16 PM
    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 07:15 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-23-2024, 10:28 AM
    0 responses
    15 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-23-2024, 07:35 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-22-2024, 02:06 PM
    0 responses
    8 views
    0 likes
    Last Post seqadmin  
    Working...
    X