Unconfigured Ad

**mascano** · 08-21-2013, 07:44 AM

A bit of an update

Running 16 threads, instead of 24, allowed tophat to complete the run:

Code:

tophat2 -p 16 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq

I have a 2 x 2.4Ghz 6-core Xeon - so that's 12-core physical plus 12 virtual with hyperthreading, which theoretically means I can assign '-p 24'

My guess is memory usage, but not entirely clear. I have 64GB RAM (which is the maximum allowed, until Mavericks OSX comes out).

Any advice on how to assign 24 threads without TopHat2 failing? Would calling the '-mm' argument work?

**GenoMax** · 08-21-2013, 08:18 AM

Why is using all 24 threads so important?

Have you considered the possibility that the storage subsystem you have on this machine is probably a bottleneck (look in the activity monitor to see if you are maxing out the throughput).

So rather than having 24 cores in some sort of iowait round robin state it may be better to start with a smaller number of cores and experiment to find the optimal performance balance.

**mascano** · 08-21-2013, 09:28 AM

Thank you for the reply and suggestion. I had not considered that the HDD io may be the bottleneck; I imagine an SSD may improve it. However, in looking at the disk activity, I haven't seen it peak anywhere near 6 Gb/sec (or 768MB) which should be the bandwidth of my HDD (using the ICH10 bridge), in a successful run (using 16 threads).

I doubt it will skyrocket to that throughput ceiling with all 24 threads, no?

**GenoMax** · 08-21-2013, 09:56 AM

There is theoretical throughput and real life performance. Since tophat suite is developed on Mac it should be optimized for OS X.

If you are interested you could look at specific application level stats by following the suggestions in this post: http://blog.yerkanian.com/2011/10/17...io-on-macos-x/

Check using the following to see CPU level performance for various processes in a terminal window (adjust parameters as needed by looking at man entry for top).

Code:

$ top -n10 -u

**mascano** · 08-21-2013, 11:13 AM

Under -p 16 conditions:

Memory usage peaked at 4 GB for long_spanning process. But IO for HDD did not exceed 20MB/sec (read or write) total. I used

Code:

sudo iotop -C 5

as well as viewing memory usage and disk activity via Activity Monitor.

I was monitoring during these log events, which is when it would fail if -p 24:

Code:

[2013-08-21 14:59:08] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2013-08-21 15:01:19] Joining segment hits
[2013-08-21 15:05:52] Reporting output tracks

**GenoMax** · 08-21-2013, 11:22 AM

So we know that 16 threads work but not 24. OS X may need some cores to keep essential parts of the OS running. Next thing to try would be to increment 16 towards 24 and see at what point the process fails.

**mascano** · 08-22-2013, 08:40 AM

So I can move to 22 cores either as a single run or parallel process runs that total up to 22. That said, I'm not being shy about using the computer simultaneously for other applications (MsOffice, Chrome, Mail, etc.) so, at least for my configuration, 22 threads is more than satisfactory.

**GenoMax** · 08-22-2013, 08:56 AM

Since you did the experiment...

How much time (if any) is saved by going from 16 to 22 cores for the same job? A rough estimate is fine if you did not time the runs.

**mascano** · 08-22-2013, 09:15 AM

Not using the exact same fastq, but of similar size (~30Mio):
I went from ~2hrs at 16 cores, to 1.5hrs at 22 cores, to 2.5 hrs at 11 cores. Although, when I run two shelled processes at 11 cores each, one of them is consistently around 2.5 and the other one 3.5 hrs. I think I'll have to just wing this and figure out the best balance of the number of parallel processes vs the number of cores per process.

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Today, 05:37 AM	0 responses 5 views 0 reactions	Last Post by SEQadmin2 Today, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 109 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Tophat2 long_spanning_reads error -- cannot open xx.bam for reading

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News