Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimizing tophat running time

    Hi friends,

    I'm trying to align 50bp paired-end Illumina reads to the mm10 genome/transcriptome with tophat 2.0.8. I've done a few runs on our local desktop Mac Pros to get an idea of how the software is working, and now I'm starting to migrate this onto our local high performance computing cluster in the hopes of running the alignments faster (or at least running them in parallel rather than sequentially) as these are large data sets. I'm wondering if anyone has advice on how much computational resources I should request per data set to get them to run quickly, without hogging the system, given that not every step in the tophat pipeline is multi-threaded?

    From my pilot alignments and from reading these forums, I understand that the segment_juncs step is single-threaded and time-consuming -- will this step run more quickly if more memory is available to it? (i.e., Is it "fair" to request more cores from the system just to have their memory? Does the speed of this step scale at all? In my pilots, the run time has been quite variable, and I haven't been able to correlate it with anything obvious.)

    Empirically I've also seen that setting the -p value to less than the actual number of cores available is also necessary to avoid problems when the tophat script shell tries to invoke samtools or other processes while running the alignment and output steps, but it is not totally clear to me what a good value for this should be to avoid problems. Is there any good rule of thumb here, like p="number of cores available" - "some particular constant that I don't know"?

    Thanks in advance for any advice you can provide. The computational side of this is pretty intimidating to a bench biologist, and I've tried to RTFM as best I can understand it, I swear I have!

  • #2
    If you want speed, try STAR. It's much more faster than tophat (and the results seems even better). The only thing is that STAR use much RAM than tophat .



    If you want to use tophat, use not to much process. I think a maximum of 10 is ok.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Recent Innovations in Spatial Biology
      by seqadmin


      Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

      3D Genomics
      While spatial biology often involves studying proteins and RNAs in their...
      01-01-2025, 07:30 PM
    • seqadmin
      Advancing Precision Medicine for Rare Diseases in Children
      by seqadmin




      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
      12-16-2024, 07:57 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 01-09-2025, 04:04 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 01-09-2025, 09:42 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 01-08-2025, 03:17 PM
    0 responses
    29 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 01-03-2025, 11:18 AM
    1 response
    47 views
    1 like
    Last Post Tonia
    by Tonia
     
    Working...
    X