Can't get Ray working - SEQanswers

You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion

X

lednakashim

Junior Member

Join Date: May 2012

Posts: 5
- Share
- Tweet
#1

Can't get Ray working

06-25-2012, 10:47 AM

I recently got access to a very powerful machine for the purpose of de novo sequencing.

I am using Ray-v2.0.0-rc8

Unfortunately, I have been unable to get ray working on even the smallest test cases with 3 different compilers (Intel, GCC, some other compiler that comes with the system).

Most of my runs end like this:

Rank 49: assembler memory usage: 3813964 KiB
Rank 75 reached 400 vertices from seed 91, flow 1
Speed RAY_SLAVE_MODE_EXTENSION 2141 units/second
Rank 75: assembler memory usage: 3816056 KiB

Stack walkback for Rank 0 starting:
[email protected]:113
__libc_start_main@0x2aaaaeeecc35
main@0x40518a
Machine::start()@0x40746f
ComputeCore::runVanilla()@0x500957
MessageProcessor::call_RAY_MPI_TAG_ASK_IS_ASSEMBLED(Message*)@0x45ec8e
Vertex::isAssembled()@0x4db3a0
Stack walkback for Rank 0 done
Process died with signal 11: 'Segmentation fault'
Forcing core dumps of ranks 0, 5, 36, 64, 8, 29, 56, 10, 12, 109, 13, 52, 66, 110
View application merged backtrace tree file with: statview atpMergedBT.dot
_pmiu_daemon(SIGCHLD): [NID 00736] [c1-0c0s0n2] [Sun Jun 24 06:03:13 2012] PE RANK 1 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 00767] [c1-0c0s0n1] [Sun Jun 24 06:03:13 2012] PE RANK 98 exit signal Killed
[NID 00736] 2012-06-24 06:03:13 Apid 6339046: initiated application termination
Application 6339046 exit codes: 137
Application 6339046 exit signals: Killed
Application 6339046 resources: utime ~39431s, stime ~74s

With input scripts such as

num="124"
aprun -n $124 ./fancierRayDEBUG/Ray -o Assembly$num -k 31 \
-p \
Sample$num/ERR011117_1.fastq.gz \
Sample$num/ERR011117_2.fastq.gz \
-p \
Sample$num/ERR011118_1.fastq.gz \
Sample$num/ERR011118_2.fastq.gz \
-p \
Sample$num/ERR011119_1.fastq.gz \
Sample$num/ERR011119_2.fastq.gz \
-p \
Sample$num/ERR011120_1.fastq.gz \
Sample$num/ERR011120_2.fastq.gz \
-p \
Sample$num/ERR011121_1.fastq.gz \
Sample$num/ERR011121_2.fastq.gz \
-p \
Sample$num/ERR011122_1.fastq.gz \
Sample$num/ERR011122_2.fastq.gz \
-p \
Sample$num/ERR011123_1.fastq.gz \
Sample$num/ERR011123_2.fastq.gz >& myOutput$num.out

Similar output when trying smaller ecoli file with an intimidating number of PEs. I tried a similar simulation with only 64 PEs but the failure was the same.

Rank 1: assembler memory usage: 3331872 KiB
Rank 1091: assembler memory usage: 3330848 KiB
Rank 1135: assembler memory usage: 3330848 KiB
Application 6339025 resources: utime ~68335618s, stime ~49747s

(gave neither error nor yield, was build without debug symbols)

aprun -n4096 -N16 -d2 ./fancierRay/Ray --show-memory-usage -o secoliAssembly$num -k 23 \
-p secoliSample$num\SRR001665_1.fastq.gz \
secoliSample$num\SRR001665_2.fastq.gz \
-p secoliSample$num\SRR001666_1.fastq.gz \
secoliSampel$num\SRR001666_2.fastq.gz >& secolimyOutput$num.out

1. I am trying to figure out how much ram I need per PE
2. Does anybody have a minimal input output example
3. I haven't done much de-novo assembly and was wondering if there are better programs for eukaryote genome assembly
4. Has anybody tried SRR034, sequences SRR034939-34975?

Last edited by lednakashim; 06-25-2012, 10:50 AM. Reason: aesthetics
Tags: ray seb567
krobison

Senior Member

Join Date: Nov 2007

Posts: 743
- Share
- Tweet
#2

06-25-2012, 12:35 PM

Is the -n 124 argument the number of MPI processes (ranks)? If so, does your machine have 124 cores? The advice I was given was to match ranks to cores.

If I didn't have my whole cluster smoking on Ray jobs, I'd run your test dataset :-) For ~10Mb bacterial genomes, on a cluster with 32Gb of RAM per cluster I am able to assemble 1Gb of MiSeq data with k=31 (advisable for Illumina data) on even just 8 cores of a cluster.
Comment
seb567

Senior Member

Join Date: Jul 2008

Posts: 260
- Share
- Tweet
#3

06-25-2012, 01:09 PM

Hello,

Originally posted by lednakashim View Post

I recently got access to a very powerful machine for the purpose of de novo sequencing.

I am using Ray-v2.0.0-rc8

Unfortunately, I have been unable to get ray working on even the smallest test cases with 3 different compilers (Intel, GCC, some other compiler that comes with the system).

Changing the compiler will not change much.

Originally posted by lednakashim View Post

Most of my runs end like this:

With input scripts such as

I don't know much about aprun. Is it like mpiexec or mpirun, but for
a given super computer ?

What is aprun and what is $124 ?

What is the meaning of this command "aprun -n4096 -N16 -d2" ?

I saw previously segmentation faults due to message corruption caused by
QLogic Performance Scaled Messaging (PSM) from Intel, Inc.

Maybe this is a similar issue caused by the middleware.

Originally posted by lednakashim View Post

Similar output when trying smaller ecoli file with an intimidating number of PEs. I tried a similar simulation with only 64 PEs but the failure was the same.

(gave neither error nor yield, was build without debug symbols)

1. I am trying to figure out how much ram I need per PE

How much memory do you have ?

Is your system running out of memory ?

Originally posted by lednakashim View Post

2. Does anybody have a minimal input output example

Try this sample: (it is E. coli)

ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...65_1.fastq.bz2
ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...65_2.fastq.bz2
ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...66_1.fastq.bz2
ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...66_2.fastq.bz2

You can give these files directly to Ray if compiled with HAVE_LIBBZ2=y

Originally posted by lednakashim View Post

3. I haven't done much de-novo assembly and was wondering if there are better programs for eukaryote genome assembly

There is a list on Wikipedia.

Originally posted by lednakashim View Post

4. Has anybody tried SRR034, sequences SRR034939-34975?

I have not. Is anything special about it ?

Sébastien
Comment
seb567

Senior Member

Join Date: Jul 2008

Posts: 260
- Share
- Tweet
#4

06-25-2012, 01:37 PM

Hi,

According to this page, aprun is the application launcher in
the Cray Linux Environment (CLE).

The option to specify the number of processor cores is -n (like in mpiexec).

In your first command, you used aprun -n $124.

In most shells, $1 is the first argument given to a shell program. Therefore, $124 will resolve to 24.

Example:

seb@fault:~/odin1/cloud$ echo $124
24

aprun -n 124 will run your job on 124 processing cores on your system.
Running only on 24 cores will make these 24 cores consume a lot of memory.
From your log, it is 3.8 GB per core.

In your second command, you used aprun -n4096 -N16 -d2.

-d2 has no sense for Ray because it specifies the number of processor cores for each processing element. This should be 1 in Ray (the default in aprun).

-N16 means the number of processing elements per node. I am pretty sure you should not touch that. The scheduler (possibly from Cray, Inc.) must be able to figure out that by itself.

-n4096 means a lot of processing elements for just a small bacterial genome.
And for that amount of processing cores, you will likely need to enable message routing in Ray.

Likely this job crashed outside of Ray because of a lack of ressource.

I hope my comments will be helpful for you.

First, you should test your system with Ray using the bacterial genome you already downloaded (SRA001125 - E. coli) with something like 2 or 3 nodes.

Sébastien
Comment
lednakashim

Junior Member

Join Date: May 2012

Posts: 5
- Share
- Tweet
#5

06-25-2012, 02:11 PM

Wow, thanks for the prompt reply!

The $124 is a typo that was made while I was posting. The post should say $num. Sorry about that :-)

If N16 is not selected the scheduler will choose the default, in our case N32. Choosing N16 doubles the memory available. Choosing N16 with d2 doubles the memory available for each process at the expense of CPU board utilization. There is a CSC article commenting on this at http://www.csc.fi/english/pages/louh...commands/aprun . Additionally, many of the instructions for using aprun that can be found on the web are specific to the systems that host the instructions. My understanding is that the defaults vary among deployed systems.

I'm going to rerun the assemblies with 16 cores. I have tried toggling the enable message routing flag, but I get similar failures. I will follow up this post with those results when the computer I use becomes available.

I am trying to understanding what kind of debug output would you find useful? Perhaps core dumps? I have inconstant failures for the same kind of setup; the same program will fail with different errors at different stages.

Last edited by lednakashim; 06-25-2012, 03:09 PM.
Comment
waterboy

Member

Join Date: Oct 2010

Posts: 14
- Share
- Tweet
#6

07-20-2012, 11:49 PM

Hello everyone...
I am using ray for assembling HiSeq2000 109 million PE reads. I have given 12 cores for ranks with 144 gb RAM. Can anyone tel me the estimated time it will take and the steps through which it goes.
At present ray is calculating the vertices.
Any help will be highly appreciated.
Comment
lednakashim

Junior Member

Join Date: May 2012

Posts: 5
- Share
- Tweet
#7

07-21-2012, 12:02 PM

Not without additional information!

1. What kind of sample (RNA, Bacteria?, higher eukaryotes?)
2. Your network, are you using an IBM cluster, a Cray cluster? Did you link together a bunch of Dells?
Comment
seb567

Senior Member

Join Date: Jul 2008

Posts: 260
- Share
- Tweet
#8

07-23-2012, 05:52 AM

The required time will depend also on the size of what you are assembling.

And what important thing if you use interconnected computers is to have a low latency. You can check this in the file NetworkTest.txt, which was written in your Ray output directory.

Originally posted by waterboy View Post

Hello everyone...
I am using ray for assembling HiSeq2000 109 million PE reads. I have given 12 cores for ranks with 144 gb RAM. Can anyone tel me the estimated time it will take and the steps through which it goes.
At present ray is calculating the vertices.
Any help will be highly appreciated.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

	Topics		Statistics	Last Post
	Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM		0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
	Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM		0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
	Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM		0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
	Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM		0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Working...

X