If you want speed, try STAR. It's much more faster than tophat (and the results seems even better). The only thing is that STAR use much RAM than tophat .
If you want to use tophat, use not to much process. I think a maximum of 10 is ok.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Optimizing tophat running time
Hi friends,
I'm trying to align 50bp paired-end Illumina reads to the mm10 genome/transcriptome with tophat 2.0.8. I've done a few runs on our local desktop Mac Pros to get an idea of how the software is working, and now I'm starting to migrate this onto our local high performance computing cluster in the hopes of running the alignments faster (or at least running them in parallel rather than sequentially) as these are large data sets. I'm wondering if anyone has advice on how much computational resources I should request per data set to get them to run quickly, without hogging the system, given that not every step in the tophat pipeline is multi-threaded?
From my pilot alignments and from reading these forums, I understand that the segment_juncs step is single-threaded and time-consuming -- will this step run more quickly if more memory is available to it? (i.e., Is it "fair" to request more cores from the system just to have their memory? Does the speed of this step scale at all? In my pilots, the run time has been quite variable, and I haven't been able to correlate it with anything obvious.)
Empirically I've also seen that setting the -p value to less than the actual number of cores available is also necessary to avoid problems when the tophat script shell tries to invoke samtools or other processes while running the alignment and output steps, but it is not totally clear to me what a good value for this should be to avoid problems. Is there any good rule of thumb here, like p="number of cores available" - "some particular constant that I don't know"?
Thanks in advance for any advice you can provide. The computational side of this is pretty intimidating to a bench biologist, and I've tried to RTFM as best I can understand it, I swear I have!Tags: None
Latest Articles
Collapse
-
by seqadmin
Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.
Nucleic Acid Quality Control
Preparing for NGS starts with isolating the...-
Channel: Articles
02-10-2025, 01:58 PM -
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, 02-07-2025, 09:30 AM
|
0 responses
63 views
0 likes
|
Last Post
by seqadmin
02-07-2025, 09:30 AM
|
||
Started by seqadmin, 02-05-2025, 10:34 AM
|
0 responses
99 views
0 likes
|
Last Post
by seqadmin
02-05-2025, 10:34 AM
|
||
Started by seqadmin, 02-03-2025, 09:07 AM
|
0 responses
78 views
0 likes
|
Last Post
by seqadmin
02-03-2025, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
Leave a comment: