Originally posted by talioto
View Post
Unconfigured Ad
Collapse
X
-
sheepyuan: what do you mean by "The same to u!" ?
If you refer to the post of taliato, I don't think it is a good idea to disable shared memory
as it is the fastest way to do message passing between processes on the same machine.
Also, Open-MPI 1.4.2 is very old. The current stable release of Open-MPI is 1.6.1.
A lot of improvements were added in Open-MPI since 1.4.2 !
And gcc 4.1.2 is very old too although I don't think this will change much.
Originally posted by sheepyuan View PostThe same to u!Last edited by seb567; 09-25-2012, 03:48 AM.
Comment
-
-
You need to install the openmpi package. For example, if you are using Fedora do a 'yum install openmpi openmpi-devel'. If the packages are already installed, make sure that they are in your path (you can add them to your .bash_profile). If you are trying to run Ray from a remote 'screen' job, make sure you source your .bash_profile too.
Comment
-
-
Guys,
I examine dthis tread from the very beginning but could not find answer for my problem. Sorry for silly question. I tried to install Ray 2.0.0 and failed on two machines, one SciLinux 5.5 and another RHEL 55, which are esentially the same. Here is the output:
[Code]
[yaximik@SciLinux55 Ray-v2.0.0]$ make PREFIX=ray-build
make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
mpic++ -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -D RAYPLATFORM_VERSION=\"1.0.3\" -I. -c -o memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp
make[1]: mpic++: Command not found
make[1]: *** [memory/ReusableMemoryStore.o] Error 127
make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
mpic++ -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -I ../RayPlatform -I. -c -o application_core/ray_main.o application_core/ray_main.cpp
make[1]: mpic++: Command not found
make[1]: *** [application_core/ray_main.o] Error 127
make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
mpic++ code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
make: mpic++: Command not found
make: *** [Ray] Error 127
[yaximik@SciLinux55 Ray-v2.0.0]$
[Code]
Her is output from RHEL55
[code]
[[yaximik@G5NNJN1 Ray-v2.0.0]$ make PREFIX=ray-build
make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
mpicxx -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -D RAYPLATFORM_VERSION=\"1.0.3\" -I. -c -o memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp
make[1]: mpicxx: Command not found
make[1]: *** [memory/ReusableMemoryStore.o] Error 127
make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
mpicxx -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -I ../RayPlatform -I. -c -o application_core/ray_main.o application_core/ray_main.cpp
make[1]: mpicxx: Command not found
make[1]: *** [application_core/ray_main.o] Error 127
make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
mpicxx code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
make: mpicxx: Command not found
make: *** [Ray] Error 127
[yaximik@G5NNJN1 Ray-v2.0.0]$
[code]
Essentially tghe same. I have
openmpiwrappers-openmpi-1-4.el5.x86_64
openmpi-1.4.-4.el5.x86_64
openmpi-devel-1.4-4. el5-x86_64
openmpi-libs-1.4-4.el5 x86_64
installed. Both machines are 64 bit, one is 2 processor, 8 GB RAM, another is 16 processor 96GB RAM. Please help as II'd like to try Ray 2.0.0 on my project.
Comment
-
-
Ray runs well when I use a single node, but when utilizing more than this I get an MPI exit code- like this
Ray:25109 terminated with signal 11 at PC=5718e0 SP=7fff9eb8a838. Backtrace:
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZNK14ReadAnnotation7getRankEv+0x0)[0x5718e0]
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN40Adapter_RAY_MPI_TAG_REQUEST_VERTEX_READS4$
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN18MessageTagExecutor11callHandlerEiP7Messag$
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN11ComputeCore3runEv+0x3cc)[0x5985ec]
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN7Machine5startEv+0x1d8d)[0x46906d]
/home/bstamps/Ray/Ray-v2.0.0/Ray(main+0x73)[0x464d73]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3fbc934cdd]
/home/bstamps/Ray/Ray-v2.0.0/Ray[0x464c39]
--------------------------------------------------------------------------
mpirun has exited due to process rank 4 with PID 25094 on
node c310 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Thoughts?
Comment
-
-
I am not a big Ray user but I will sometimes get the above problem and then when I do a re-run the problem goes away. I think that it has to do with my cluster's setup. I suggest trying a small run and put one job per node just to make sure that everything will work.Originally posted by bstamps View PostRay runs well when I use a single node, but when utilizing more than this I get an MPI exit code- like this
...
--------------------------------------------------------------------------
mpirun has exited due to process rank 4 with PID 25094 on
node c310 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Not much help, I know, but the general idea is that the problem may be with your hardware setup and not with ray.
Comment
-
-
It appears setting my ptile below the maximum per node (16) has solved the problem...I'll have to go bug my computing center as to why 15 is kosher and 16 causes MPI to die. Either way I'm very happy with Ray's performance- being able to span my job across 4500 cores has sped assembly up quite a bit...
Comment
-
-
I spoke a little too soon- Ray appears to be throwing segmentation faults randomly through the assembly process on random nodes. Adding in "route-messages" seems to have helped, but my jobs still fail every so often. The computing center seem to think it's an issue with Ray, but I'm curious as to what the community thinks.
Comment
-
-
Hi,Originally posted by bstamps View PostRay runs well when I use a single node, but when utilizing more than this I get an MPI exit code- like this
Ray:25109 terminated with signal 11 at PC=5718e0 SP=7fff9eb8a838. Backtrace:
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZNK14ReadAnnotation7getRankEv+0x0)[0x5718e0]
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN40Adapter_RAY_MPI_TAG_REQUEST_VERTEX_READS4$
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN18MessageTagExecutor11callHandlerEiP7Messag$
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN11ComputeCore3runEv+0x3cc)[0x5985ec]
/home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN7Machine5startEv+0x1d8d)[0x46906d]
/home/bstamps/Ray/Ray-v2.0.0/Ray(main+0x73)[0x464d73]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3fbc934cdd]
/home/bstamps/Ray/Ray-v2.0.0/Ray[0x464c39]
--------------------------------------------------------------------------
mpirun has exited due to process rank 4 with PID 25094 on
node c310 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Thoughts?
Ray v2.1.0 was released today. There are a lot of bug fixes, with 2 fixes for 2 bugs that could lead to segmentation faults.
Comment
-
-
It sounds like a race condition. The bug may be in Ray, who knows.Originally posted by westerman View PostI am not a big Ray user but I will sometimes get the above problem and then when I do a re-run the problem goes away. I think that it has to do with my cluster's setup. I suggest trying a small run and put one job per node just to make sure that everything will work.
Not much help, I know, but the general idea is that the problem may be with your hardware setup and not with ray.
Comment
-
-
What is "ptile" ? Are you using a fancy architecture (Cray XE6 or Blue Gene /Q for instance) ?Originally posted by bstamps View PostIt appears setting my ptile below the maximum per node (16) has solved the problem...I'll have to go bug my computing center as to why 15 is kosher and 16 causes MPI to die. Either way I'm very happy with Ray's performance- being able to span my job across 4500 cores has sped assembly up quite a bit...
I guess you are playing with fancy hardware, right ?Originally posted by bstamps View Postacross 4500 cores
Comment
-
-
It can possibly be a bug in Ray. Every software has bugs. Can you try with the new Ray v2.1.0 to see if the numerous bug fixes alleviate your problem ?Originally posted by bstamps View PostI spoke a little too soon- Ray appears to be throwing segmentation faults randomly through the assembly process on random nodes. Adding in "route-messages" seems to have helped, but my jobs still fail every so often. The computing center seem to think it's an issue with Ray, but I'm curious as to what the community thinks.
Can you send an email on the list with your hardware and Ray command ?
Pure MPI applications may not be the answer for very large clusters, hybrid programming models are likely better.
We have work in progress on a new hybrid programming model. At the moment, Ray only uses MPI (v2.1.0 for instance). So when you run on 8 nodes * 24 cores / node = 192 cores, Ray is launched on 192 processes, with 24 processes per node.
We have devised a new programming model called "mini-ranks". If you Google "mini-ranks", you will mostly find hits about Lego blocks because "mini-ranks" in parallel programming is new as I believe we invented that ourselves !
Our implementation of the mini-ranks model can use 1 MPI process per node, 23 POSIX threads per process and an additional communication thread for each node. The mini-ranks run inside POSIX threads and the MPI rank actually does not do much.
Ray is already ported to that model (mini-ranks implemented with MPI+POSIX threads) in the git source tree.
Instead of launching like this:
mpiexec -n 192 Ray ...
You launch it like this:
mpiexec -n 8 -bynode Ray -mini-ranks-per-rank 23 ...
Note that our "mini-ranks" implementation needs 1 thread for communication for each node.
Although this is experimental, you may be interested to test that on your hardware.
The branch is called minirank-model should you want to check that.
Sébastien Boisvert
Ray maintainer
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 08:59 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
22 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
32 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
Comment