Seqanswers Leaderboard Ad

**seb567** · 10-30-2012, 03:22 PM

Release of Ray v2.1.0 (mostly bug fixes)

Hello,

Ray v2.0.0 was released on 2012-06-22. It is time to release Ray v2.1.0 !

It is available directly at

http://sourceforge.net/projects/denovoassembler/files/Ray-v2.1.0.tar.bz2

Documentation was added for the metagenomics solutions called 'Ray Méta',
'Ray Communities', and 'Ray Ontologies' that are implemented in Ray plugins.

Changes in bioinformatics algorithm implementations:

Changes include a new data reliability option, options to control the maximum (or
minimum) accepted k-mer coverage, a fix for a race condition in the plugin that colors
the graph, new options for the storage engine, faster network tests, fixes for input files
compressed with bunzip2, ability to disable scaffolding, various portability fixes, patches
for twin k-mers (efficient storage), faster building of the distributed graph,

Changes in the runtime engine:

The distributed storage backend was optimized, added hardware acceleration with pop count
when available, new registration system for plugins, bug fixes in the hash table, default
communication model is now MPI_Iprobe / MPI_ANY_SOURCE, new routines for dirty buffer
management, polytope communication graph.

Full list:

---
Changes between Ray v2.0.0 and Ray v2.1.0:

100 files changed, 4294 insertions(+), 2398 deletions(-)

Pier-Luc Plante (3):
Scaffolder is not required when using unpaired reads.
Patch Koala: Added an option (-use-maximum-seed-coverage) so that higly-covered seeds can be ignored.
Corrected the tet that determines the quality control results. There was too much false negatives. The returned value is more reliable now.

Sébastien Boisvert (142):
The copyright was updated to add 2012.
When there are 508 reads and 32 MPI ranks, the number of reads per rank is 508/32= 15. Therefore, assuming a perfect division read number 495 would be on MPI rank 33 (495/15 = 33). This makes Ray crash. This change set corrects this.
A list of releases was added.
The codename of the next release will be "Ancient Granularity of Epochs".
An assertion was added for the performance scaled messaging related bug.
Two assertions were added to detect possible message corruption.
The help page was update to add the data reliability option. Signed-off-by: Sébastien Boisvert <[email protected]>
The peak finder was modified to pass new tests.
I edited the guide to submit changes.
The manual now includes the new option for overly-covered seeds.
A error was fixed in the file that says how to submit changes.
The return statement was misplaced in a recent patch.
I added the names 'Ray Méta', 'Ray Communities', and 'Ray Ontologies'.
An assertion was added to make sure that data is not overwritten.
Searcher: added verbose statements
Searcher: fixed a race condition
Searcher: added a missing value.
SeedExtender: moved system calls inside this plugin
SeedExtender: modified the code for hot skipping
SeedExtender: implemented hot skipping
Parameters: 4 options were added to change distributed storage behavior.
Documentation: Ray can be run with a single configuration file containing options.
The default load factor threshold was changed to 0.75.
The methods setKey() and getKey() were added to KmerCandidate and Vertex classes for compatibility with MyHashTable.
If the hash table is verbose, ask it to display its status.
NetworkTest: added the option -skip-network-test to skip the network test.
Added a new option to enable genome neighbourhood calculation. The option is -find-neighbourhoods
I added some code to detect windows 32 bits and windows 64 bits.
More parameters for compilation can be provided with EXTRA=...
Porting Ray to the new RayPlatform: removed macro calls in .h files.
Porting Ray to the new RayPlatform: removed remaining codes in .h.
Porting Ray to the new RayPlatform: removed token 'generated_automatically'.
Porting Ray to the new RayPlatform: added CreatePlugin and BindPlugin instructions.
Porting Ray to the new RayPlatform: updated the macro names in C++ plugin files.
Porting Ray to the new RayPlatform: removed adapter from plugin class definitions.
Porting Ray to the new RayPlatform: remove calls to setObject.
Porting Ray to the new RayPlatform: Ray compiles with the simplified RayPlatform adapters now.
I removed handlers from the cmake file.
Updating the manual.
SeedExtender: changed the verbosity period.
Removed some output from the computation of seeds.
The manual was updated to include pointers to documentation.
If you run Ray with a configuration file (mpiexec -n 4 Ray Ray.conf) you can start comments with the '#' symbol like in python.
Information to compile Ray with gcc was added.
The default number of buckets is now 1048576. The default number of buckets per group is still 64, so that is only 16384 groups with almost no memory usage because it is sparse.
This fixes a input/output bug for the Ray configuration file.
The code that randomizes the arguments was removed because it can lead to bugs. This also simplifies checkpointing.
The edge purging should be done in a massively parallel way unless the option -write-kmers was provided.
Merge branch 'master' of https://github.com/plpla/ray into pl
I added a script to build Ray with link time optimization.
The EXTRA commands are also given to the linking command.
I added -fwhole-program for better optimization.
I added compilation flags for compression.
I added instructions to build Ray with link time optimization.
NetworkTest: the number of test messages is now constant regardless of the number of MPI ranks in the communicator.
application_core: added a call to obtain a string configuration token.
KmerAcademyBuilder: option -bloom-filter-bits can sets the number of bits.
KmerAcademyBuilder: Bloom filter has 64 M bits by default.
Merge branch 'master' of github.com:sebhtml/ray
Merge branch 'master' of github.com:sebhtml/ray
SequencesLoader: added a 'please wait' before counting entries in a file.
SequencesLoader: a bz2 file can contain many compressed streams. Each of them needs to be opened, read (until BZ_STREAM_END), and closed.
application_core: bugs were fixed in the configuration routines.
GeneOntology: removed the use of argv
Merge branch 'master' of github.com:sebhtml/ray
Merge branch 'master' of github.com:sebhtml/ray
Fixed an integer overflow in the distributed storage engine.
A path with 0 k-mers has 0 nucleotides, not 0-k+1.
Merge branch 'master' of github.com:sebhtml/ray
A new routing graph is available: the hypercube.
Documentation: documented the hypercube features of Ray.
core: the default number of buckets is now 268435456 per rank.
scaffolder: it can be disabled with -disable-scaffolder
normalized option names with -enable-* and -disable-*
documentation: moved assembly options up
core: added documentation for class Parameters.
SeedingData: -use-minimum-seed-coverage changes the minimum
documentation: added missing operands in the manual and -help page
core: Ray -version provides more compile flags like popcnt and sse
SeedingData: seeds can not contain k-mers with too low coverage
build: the C++ standard is C++ 1998. gcc -ansi provides that
Searcher: large integer constants needs ULL for portability
SeedExtender: added additional information for an error
MessageProcessor: k-mer data messages should never be discarded
VerticesExtractor: don't flush while waiting for messages
KmerAcademyBuilder: only send the forward k-mer, not the lower
VerticesExtractor: improved the code quality for easier reading
MessageProcessor: don't discard k-mers while receiving messages
VerticesExtractor: store twin edges in a single source
EdgePurger: any edge is removed only if a end is not in the graph
MessageProcessor: removed a call to a private attribute
Documentation: added a document about profiling Ray
Documentation: added information about elapsed time
BuildSystem: added a strip command to reduce the memory footprint
BuildSystem: replaced -ansi with -std=c++98 for more verbosity
Documentation: updated the author file
KmerAcademyBuilder: removed the k-mer academy
VerticesExtractor: this module extracts vertices to add edges
Merge branch 'kill-kmer-academy'
MessageProcessor: new text to show when the Bloom filter is created
KmerAcademyBuilder: added the number of set bits in the Bloom filter
MessageProcessor: added a warning when the oracle is half full
KmerAcademyBuilder: the Bloom filter can have any number of bits
Merge branch 'bloom-features'
MessageProcessor: coverage depth starts at 1 with Bloom filters
MessageProcessor: the thresold is 50.0 (50.0%), not 0.5
KmerAcademyBuilder: added the number of filtered k-mers
Merge branch 'bug-hunting'
application_core: added routing with a convex regular polytope
NetworkTest: the number of exchange can be changed with -exchanges
Documentation: added options for a 64-rank polytope
Documentation: updated the taxonomy documentation
NetworkTest: added average round trip latency
scripts: initial version of a script to create NCBI taxonomy
scripts: download NCBI bacterial genomes too
Merge branch 'master' of github.com:sebhtml/ray
Documentation: added documentation for NCBI taxonomy
Documentation: simplified the usage of the tool to pull NCBI data Signed-off-by: Sébastien Boisvert <[email protected]>
scripts: the script that pulls NCBI data is almost ready
scripts: the script that pulls NCBI stuff is ready
Documentation: added information about XML files
Partitioner: also create a file FilePartition.txt
MachineHelper: don't run the AMOS code path if not necessary
Parameters: throw a warning when distances are invalid
Merge branch 'for-seb-September-2012'
Searcher: fixed a race condition where a message was lost
Calls to deprecated methods were eliminated.
This is Ray v2.1.0-rc0 "Ancient Granularity of Epochs"
Searcher: browsing the distributed colored de Bruijn subgraph
Searcher: find or create a virtual color from physical colors
Searcher: added physical color in SequenceAbundances.xml
Searcher: fixed assertion code
scripts: don't ship the example and only ship the bz2 distribution
SequencesLoader: fixed the scope of a buffer
Searcher: removed debug messages from stable release
Documentation: added more documentation for gene ontology.
Searcher: fixed buffer overflow
Searcher: fixed compilation warnings
Searcher: GraphBrowsing.xml needs -one-color-per-file
This is the branch for Ray v2.1.0-rc1
Related git repositories were added in the README.
Ray v2.1.0

---
Changes between RayPlatform v1.0.3 and RayPlatform v1.1.0:

52 files changed, 3215 insertions(+), 1244 deletions(-)

Sébastien Boisvert (58):
A release list was added.
Message checksum are calculated by default for any non-empty message by RayPlatform.
The option -verify-message-integrity must be provided to enable message integrity verification in RayPlatform. By default, the checksum is calculated by the software.
A integer comparison was fixed.
I implemented a system of annotation for buffers. With this, RayPlatform knows which buffer is dirty (possibly available, but maybe not) and which buffer is available.
I fixed a typographical error in the documentation.
I added a comment for dirty buffers. Because MPI_Request objects are usually "completed" before the message is actually on the destination, I don't think the RayPlatform virtual machine is going to run out of non-dirty buffer.
The latency on a IBM iDataPlex (guillimin at McGill) for a Ray job of 36 cores was reduced from 23 to 17 microseconds (back and forth).
I cleaned the persistent communication code.
Merge branch 'master' of github.com:sebhtml/RayPlatform
The three communication models were documented in the source code. The three models are:
The constructor of the hash table now takes the number of buckets, the number of buckets per group, and load factor threshold as well as the verbosity.
structures: increased portability of the hash table code.
The class for hash table groups was moved to its own file.
This fixes a bug introduced while working on the portability.
The table prints its status after completion of the resizing, when in verbose mode.
I added David Weese of Free University of Berlin in the code as he reviewed the hash table code.
structure: using compiler builtins for some processing in the hash table.
The specific code was moved inside one portable method.
I added some comments in the ring allocator.
Status is not printed if verbosity is not enabled.
The registration system for plugins was changed. Now it uses function pointers instead of virtual methods, which can be slow as they can not be inlined.
I added MessageWarden in the README.
I added some documentation for handlers.
Some more documentation was added.
This fixes a bug in the insert() operation of the hash table during incremental resizing.
h1 must return something between 0 and M-1 whereas h2 must return something odd between 1 and M-1. This was fixed in the code.
The hash table also prints memory allocation information when printing its status.
communication: switched the model to MPI_ANY_SOURCE.
Added routines to clean dirty buffers when they are all dirty.
A new routing graph is available: it is the hypercube.
The hypercube prints its status before the end.
routing: added status code for hypercube.
communication: improved the last step in routing.
routing: started to implement a round-robin policy for hypercube routing.
routing: the round-robin hypercube is available in the code.
routing: the hypercube can be modified to be a pseudo-hypercube
communitation: increased the number of buffers for messaging
communication: removed a useless line in the code
Updated the code name for the upcoming release.
communication: registration of dirty buffers is more efficient.
communication: errors related to dirty buffers are more verbose
cryptography: now using __SSE4_2__ provided by gcc -march=native
Documentation: updated the author file
structures/MyHashTable: added missing headers
communication: show a warning when at least 64 buffers are dirty
routing: added routing with a convex regular polytope
MessageRouter: store the routing information in the buffer
routing: don't write routes for the polytope surface (called hypercube)
core: fixed a buffer allocation bug in the core
communication: the real-time sweeper is better configured
the upper bound for the number of sent messages is not m_size
This is RayPlatform (the engine) v1.1.0-rc0 "Chariot of Complexity"
ComputeCore: routed messages must be purged
communication: introducing the CONFIG_COMM_IRECV_TESTANY model
communication: non-blocking communication is bad on Blue Gene /Q
This is the branch development version for RayPlatform v1.1.0-rc1
RayPlatform v1.1.0

**bstamps** · 11-01-2012, 05:21 AM

Seb- the update has fixed my segfault issues. Thank you! 28 minutes for 1 2x150 and 1 2x250 MiSeq library to assemble a single fungal genome (160 cores), I am impressed!

**bstamps** · 11-01-2012, 05:26 AM

Originally posted by seb567 View Post

What is "ptile" ? Are you using a fancy architecture (Cray XE6 or Blue Gene /Q for instance) ?

I guess you are playing with fancy hardware, right ?

Just saw this one, apologies for the double reply- we're running on an intel sandy bridge cluster. http://www.oscer.ou.edu/hardsoft_del...dge_boomer.php

Not huge, but it certainly gets the job done. Ptile is in reference to my LSF batch handling (BSUB). You have to specify number of MPI processes (p) and how many processes per node (ptile). We also have a hybrid MPI/OpenMP(or POSIX) system in place to do hybrid jobs with an MPI ptile of 1, and 16 threads per node.

**seb567** · 11-01-2012, 07:29 AM

Originally posted by bstamps View Post

Seb- the update has fixed my segfault issues. Thank you! 28 minutes for 1 2x150 and 1 2x250 MiSeq library to assemble a single fungal genome (160 cores), I am impressed!

That's one bug less to deal with then !

In fact, one of the patches included in v2.1.0 guarantees the coherency of DNA strands in the de Bruijn graph. In v2.0.0 and before, it was not necessarily depending on various factors. A few random bugs occurred because of incoherency in the distributed storage engine.

Sébastien

**seb567** · 11-01-2012, 07:54 AM

Hello,

Originally posted by bstamps View Post

Just saw this one, apologies for the double reply- we're running on an intel sandy bridge cluster. http://www.oscer.ou.edu/hardsoft_del...dge_boomer.php

Not huge, but it certainly gets the job done. Ptile is in reference to my LSF batch handling (BSUB). You have to specify number of MPI processes (p) and how many processes per node (ptile).

That's a nice machine.

Originally posted by bstamps View Post

We also have a hybrid MPI/OpenMP(or POSIX) system in place to do hybrid jobs with an MPI ptile of 1, and 16 threads per node.

As you may know, Ray ships with a library called RayPlatform, which abstracts all the parallel stuff from the programmer. In Ray v2.0.0 and v2.1.0, the associated RayPlatform library (versions 1.0.3 and 1.1.0, respectively) only utilizes MPI.

Pure MPI applications work well on some machines, and not so much on others, usually because the Host Communication Adapter is being used by too many MPI processes on each node. That where hybrids come in.

Hybrids are truly the future. I visited Argonne National Laboratory recently and I discussed with Professor Rick Stevens about hybrid programming models. Myself, Rick Stevens, and Fangfang Xia devised something called the "mini-ranks" hybrid programming model.

The next release of Ray (likely something like 2.1.1) will run on RayPlatform 7.0.0, which will include support for our newly introduced "mini-ranks" hybrid programming model.

So on your hybrid machine, you will be able to run Ray like this, (assuming 8 nodes and 16 hardware threads per node):

mpiexec -n 8 -bynode \
Ray -mini-ranks-per-rank 15 \
-k 31 -o MiniRanksAreCool \
-p joe1.fastq.bz2 joe2.fastq.bz2 \
-p thor1.fastq.gz thor2.fastq.gz \

This will launch 1 MPI process per node. Each MPI process will have exactly 15 mini-ranks. Each mini-rank will run in 1 IEEE POSIX thread and an additional thread
(the origin control thread of the process) will do MPI calls.

If you feel this is interesting for your laboratory, there is a preliminary implementation of this available for testing.

You need to do this to install (copy and paste in a terminal):

mkdir Ray-mini-ranks-MPI+pthread
cd Ray-mini-ranks-MPI+pthread

git clone git://github.com/sebhtml/RayPlatform.git
cd RayPlatform
git checkout minirank-model
cd ..

git clone git://github.com/sebhtml/ray.git
cd ray
git checkout minirank-model
cd ..

make

mpiexec -n 2 ./Ray -mini-ranks-per-rank 2 -o Test -test-network-only &> /dev/null

Sébastien

Sent from my IBM Blue Gene/Q

**cwzkevin** · 11-04-2012, 02:28 PM

Hi Sebastien,
I am going to try Ray. Make and install is successful when I don't turn on LIBZ or LIBBZ2.

However, make error if I turn on "HAVE_LIBZ = y" and/or "HAVE_LIBBZ2 = y".

The system here has both library installed:

Code:

$ ll /usr/lib64/libz*
-rwxr-xr-x 1 root root 108628 Mar 16  2011 /usr/lib64/libz.a*
lrwxrwxrwx 1 root root     19 Sep 16  2011 /usr/lib64/libz.so -> ../../lib64/libz.so*
lrwxrwxrwx 1 root root     21 Sep 16  2011 /usr/lib64/libz.so.1 -> ../../lib64/libz.so.1*
lrwxrwxrwx 1 root root     25 Sep 16  2011 /usr/lib64/libz.so.1.2.3 -> ../../lib64/libz.so.1.2.3*
ll /usr/lib64/libbz2*
-rwxr-xr-x 1 root root 77606 Sep 20  2010 /usr/lib64/libbz2.a*
lrwxrwxrwx 1 root root    11 Dec 10  2010 /usr/lib64/libbz2.so -> libbz2.so.1*
lrwxrwxrwx 1 root root    15 Dec 10  2010 /usr/lib64/libbz2.so.1 -> libbz2.so.1.0.3*
-rwxr-xr-x 1 root root 67792 Sep 20  2010 /usr/lib64/libbz2.so.1.0.3*

I am not sure, but I think the problem might be that the /usr/lib64 is not in my search path. I have bellow line in my ~/.bashrc

Code:

export LD_RUN_PATH=$MYSOFT/openmpi-1.6.3/lib:/usr/lib64

So, the question is that if the problem comes from the library search path, how to edit the Makefile to get it work? If the problem is not from the library search path, then how? Thank you.
Here is the error messages:

Code:

mpicxx  -lz -lbz2  code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::load(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)':
Loader.cpp:(.text+0x8a3): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
Loader.cpp:(.text+0x8c3): undefined reference to `FastqBz2Loader::getSize()'
Loader.cpp:(.text+0xb3b): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
Loader.cpp:(.text+0xb5b): undefined reference to `FastqGzLoader::getSize()'
Loader.cpp:(.text+0xc5e): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
Loader.cpp:(.text+0xd0c): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::loadSequences()':
Loader.cpp:(.text+0x1ac): undefined reference to `FastqGzLoader::load(int, ArrayOfReads*, MyAllocator*, int)'
Loader.cpp:(.text+0x20c): undefined reference to `FastqBz2Loader::load(int, ArrayOfReads*, MyAllocator*, int)'
collect2: ld returned 1 exit status
make: *** [Ray] Error 1

**seb567** · 11-04-2012, 03:41 PM

You don't need to edit the Makefile.

Are you compiling with this:
make HAVE_LIBZ=y HAVE_LIBBZ2=y
?

This should work as is if you have openmpi, zlib, and bzip2 (and associated
-devel packages depending of your system).

It seems that in your case that FastqBz2Loader.o and FastqGzLoader.o are not compiled
and therefore not linked. FastqBz2Loader.o is compiled and linked only with HAVE_LIBBZ2=y and FastqGzLoader.o is only compiled and linked with HAVE_LIBZ=y.

I suspect that you edited the Makefile.

Can you provide your make command with all its output in pastebin [1] ?

Hopefully, Ray will soon be available as precompiled packages for Debian [2], Fedora [3], and ArchLinux [4]. Packages will probably be distributed in Ubuntu (via Debian) and Red Hat Enterprise Linux (via Fedora).

Just out of curiosity, what operating system are you running Ray on ?

---

[1] http://pastebin.com/
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692238
[3] https://bugzilla.redhat.com/show_bug.cgi?id=872783
[4] https://github.com/sebhtml/Ray-on-ArchLinux

Originally posted by cwzkevin View Post

Hi Sebastien,
I am going to try Ray. Make and install is successful when I don't turn on LIBZ or LIBBZ2.

However, make error if I turn on "HAVE_LIBZ = y" and/or "HAVE_LIBBZ2 = y".

The system here has both library installed:

Code:

$ ll /usr/lib64/libz*
-rwxr-xr-x 1 root root 108628 Mar 16  2011 /usr/lib64/libz.a*
lrwxrwxrwx 1 root root     19 Sep 16  2011 /usr/lib64/libz.so -> ../../lib64/libz.so*
lrwxrwxrwx 1 root root     21 Sep 16  2011 /usr/lib64/libz.so.1 -> ../../lib64/libz.so.1*
lrwxrwxrwx 1 root root     25 Sep 16  2011 /usr/lib64/libz.so.1.2.3 -> ../../lib64/libz.so.1.2.3*
ll /usr/lib64/libbz2*
-rwxr-xr-x 1 root root 77606 Sep 20  2010 /usr/lib64/libbz2.a*
lrwxrwxrwx 1 root root    11 Dec 10  2010 /usr/lib64/libbz2.so -> libbz2.so.1*
lrwxrwxrwx 1 root root    15 Dec 10  2010 /usr/lib64/libbz2.so.1 -> libbz2.so.1.0.3*
-rwxr-xr-x 1 root root 67792 Sep 20  2010 /usr/lib64/libbz2.so.1.0.3*

I am not sure, but I think the problem might be that the /usr/lib64 is not in my search path. I have bellow line in my ~/.bashrc

Code:

export LD_RUN_PATH=$MYSOFT/openmpi-1.6.3/lib:/usr/lib64

So, the question is that if the problem comes from the library search path, how to edit the Makefile to get it work? If the problem is not from the library search path, then how? Thank you.
Here is the error messages:

Code:

mpicxx  -lz -lbz2  code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::load(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)':
Loader.cpp:(.text+0x8a3): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
Loader.cpp:(.text+0x8c3): undefined reference to `FastqBz2Loader::getSize()'
Loader.cpp:(.text+0xb3b): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
Loader.cpp:(.text+0xb5b): undefined reference to `FastqGzLoader::getSize()'
Loader.cpp:(.text+0xc5e): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
Loader.cpp:(.text+0xd0c): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::loadSequences()':
Loader.cpp:(.text+0x1ac): undefined reference to `FastqGzLoader::load(int, ArrayOfReads*, MyAllocator*, int)'
Loader.cpp:(.text+0x20c): undefined reference to `FastqBz2Loader::load(int, ArrayOfReads*, MyAllocator*, int)'
collect2: ld returned 1 exit status
make: *** [Ray] Error 1

**cwzkevin** · 11-04-2012, 04:33 PM

Yes, I edited the Makefile. Changed was made as below:

Code:

MAXKMERLENGTH = 96
HAVE_LIBZ = y
HAVE_LIBBZ2 = y

My make command is just simple as

Code:

$ make PREFIX=bin

My system is

Code:

$ uname -mrs
Linux 2.6.18-308.8.2.el5 x86_64
$ lsb_release -a
LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Release:        5.8
Codename:       Tikanga

Here is the link for full output http://pastebin.com/Kf35v5SK

Thank you for your help.

Originally posted by seb567 View Post

You don't need to edit the Makefile.

Are you compiling with this:
make HAVE_LIBZ=y HAVE_LIBBZ2=y
?

This should work as is if you have openmpi, zlib, and bzip2 (and associated
-devel packages depending of your system).

It seems that in your case that FastqBz2Loader.o and FastqGzLoader.o are not compiled
and therefore not linked. FastqBz2Loader.o is compiled and linked only with HAVE_LIBBZ2=y and FastqGzLoader.o is only compiled and linked with HAVE_LIBZ=y.

I suspect that you edited the Makefile.

Can you provide your make command with all its output in pastebin [1] ?

Hopefully, Ray will soon be available as precompiled packages for Debian [2], Fedora [3], and ArchLinux [4]. Packages will probably be distributed in Ubuntu (via Debian) and Red Hat Enterprise Linux (via Fedora).

Just out of curiosity, what operating system are you running Ray on ?

---

[1] http://pastebin.com/
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692238
[3] https://bugzilla.redhat.com/show_bug.cgi?id=872783
[4] https://github.com/sebhtml/Ray-on-ArchLinux

**cwzkevin** · 11-04-2012, 04:50 PM

Oh, I see. With below, it is good now.

Code:

$ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y

Seems my below question is more related to common sense of linux/compiler instead of Ray:
Q: What is the difference between Method 1 and Method 2, shouldn't they be the same?
Method 1: I edited the Makefile, changed to "HAVE_LIBZ = y", "HAVE_LIBBZ2 = y", then $ make PREFIX=bin
Method 2: $ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y
Thanks.

**seb567** · 11-04-2012, 05:19 PM

Hi !

Thanks for the logs, that really helps understanding what's going
on.

You can build Ray with these options without any Makefile modification:

$ make clean
$ make MAXKMERLENGTH=96 HAVE_LIBZ=y HAVE_LIBBZ2=y PREFIX=bin
$ make install

$ mpiexec -n 1 bin/Ray -version
Ray version 2.1.0
License for Ray: GNU General Public License version 3
RayPlatform version: 1.1.0
License for RayPlatform: GNU Lesser General Public License version 3

MAXKMERLENGTH: 96 <=========== Here you go !
KMER_U64_ARRAY_SIZE: 3
Maximum coverage depth stored by CoverageDepth: 4294967295
MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes
FORCE_PACKING = n
ASSERT = n
HAVE_LIBZ = y <=========== Here you go !
HAVE_LIBBZ2 = y <=========== Here you go !
CONFIG_PROFILER_COLLECT = n
CONFIG_CLOCK_GETTIME = n
__linux__ = y
_MSC_VER = n
__GNUC__ = y
RAY_32_BITS = n
RAY_64_BITS = y
MPI standard version: MPI 2.1
MPI library: Open-MPI 1.5.4
Compiler: GNU gcc/g++ 4.7.2 20120921 (Red Hat 4.7.2-2)

Originally posted by cwzkevin View Post

Oh, I see. With below, it is good now.

Code:

$ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y

Seems my below question is more related to common sense of linux/compiler instead of Ray:
Q: What is the difference between Method 1 and Method 2, shouldn't they be the same?
Method 1: I edited the Makefile, changed to "HAVE_LIBZ = y", "HAVE_LIBBZ2 = y", then $ make PREFIX=bin
Method 2: $ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y
Thanks.

Now, if why editing the Makefile fails ?

There are many Makefile files actually (distributed Makefiles)

Ray-v2.1.0/Makefile
Ray-v2.1.0/code/Makefile
Ray-v2.1.0/code/*/Makefile (23)

When you provide the variables in the make command line, they will
be given to child processes because they are exported. However,
variables within a Makefile are not exported.

It fails because of this:

Ray-v2.1.0/code/plugin_SequencesLoader/Makefile:

SequencesLoader-$(HAVE_LIBBZ2) += plugin_SequencesLoader/BzReader.o
SequencesLoader-$(HAVE_LIBBZ2) += plugin_SequencesLoader/FastqBz2Loader.o
SequencesLoader-$(HAVE_LIBZ) += plugin_SequencesLoader/FastqGzLoader.o

These configuration options are used by the Makefiles, but also by the
C++ code. For example, HAVE_LIBZ is valued to y in all the Makefiles,
and the -D HAVE_LIBZ passed to gcc defines HAVE_LIBZ in all C++ files
too.

If you really want to edit the Makefile, you have to do it like this:

--- Ray-v2.1.0/Makefile 2012-10-30 18:29:34.000000000 -0400
+++ Ray-v2.1.0-copy/Makefile 2012-11-04 20:05:54.099217300 -0500
@@ -33,13 +33,13 @@
# needs libz
# set to no if you don't have libz
# y/n
-HAVE_LIBZ = n
+export HAVE_LIBZ = y

# support for .bz2 files
# needs libbz2
# set to no if you don't have libbz2
# y/n
-HAVE_LIBBZ2 = n
+export HAVE_LIBBZ2 = y

# use Intel's compiler
# the name of the Intel MPI C++ compiler is mpiicpc

If you know programming, you can send me a patch that fixes this bug
in the Makefile that would add 'export ' in front of build options.

If you have other questions regarding the Ray build system,
let me know.

Otherwise, I'll put this in my patchwork queue !

***
Cheers, Sébastien

Originally posted by cwzkevin View Post

Yes, I edited the Makefile. Changed was made as below:

Code:

MAXKMERLENGTH = 96
HAVE_LIBZ = y
HAVE_LIBBZ2 = y

My make command is just simple as

Code:

$ make PREFIX=bin

My system is

Code:

$ uname -mrs
Linux 2.6.18-308.8.2.el5 x86_64
$ lsb_release -a
LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Release:        5.8
Codename:       Tikanga

Here is the link for full output http://pastebin.com/Kf35v5SK

Thank you for your help.

**cwzkevin** · 11-04-2012, 07:38 PM

Thank you very much for the detail, I understand now. Great appreciate it!

Sorry, I am not a programmer. (I think a programmer should already know there could be distributed Makefiles instead of the one that I edited. ^_^)

Now, it is time to try it out.
Thanks!

**kmkocot** · 11-07-2012, 04:17 PM

Hi all,

Quick question: I have a paired-end data from a MiSeq that I would like to assemble in Ray. The library was made with a Nextera kit and sequenced using the new 2 X 250 reagent kits. The average size distribution of my library was around 500 bp but some smaller fragments were present. For those fragments, the read pairs will at least partially overalp. Does Ray have a problem when the two members of a pair of reads overlap? Should I treat the data as non paired end?

Thanks!
Kevin

**cwzkevin** · 11-08-2012, 01:21 PM

Hi, I have a question here.
Does Ray expect the sequence order in two pair-end files the same?
I ask because the sequence order in my two pair-end fastq files happened to be different the last time. They are indeed pair files, just the sequences are in different order. And I ran these pair-end files with Ray, got output1. After I realized the sequence order are not the same, I sort the fastq files to make them same order. I then re-ran Ray, got output2. It seems the two run results are different.
Thank you.

**seb567** · 02-04-2013, 08:24 PM

Originally posted by kmkocot View Post

Hi all,

Quick question: I have a paired-end data from a MiSeq that I would like to assemble in Ray. The library was made with a Nextera kit and sequenced using the new 2 X 250 reagent kits. The average size distribution of my library was around 500 bp but some smaller fragments were present. For those fragments, the read pairs will at least partially overalp. Does Ray have a problem when the two members of a pair of reads overlap? Should I treat the data as non paired end?

Thanks!
Kevin

Hi,

Ray will be fine with those.

I suggest you run something like this:

mpiexec -n 16 Ray -k 71 -p file_R1.fastq.gz file_R2.fastq.gz -o MiSeq+Ray

Also, you can use Ray Cloud Browser too to visualize your assembly in your web browser.

Demo: http://genome.ulaval.ca/corbeillab/Ray-Cloud-Browser

p.s.: you'll need to compile with this:

make MAXKMERLENGTH=96 HAVE_LIBZ=y

---
-Sébastien

**seb567** · 02-04-2013, 08:28 PM

Originally posted by cwzkevin View Post

Hi, I have a question here.
Does Ray expect the sequence order in two pair-end files the same?

yes

Originally posted by cwzkevin View Post

I ask because the sequence order in my two pair-end fastq files happened to be different the last time. They are indeed pair files, just the sequences are in different order. And I ran these pair-end files with Ray, got output1. After I realized the sequence order are not the same, I sort the fastq files to make them same order. I then re-ran Ray, got output2. It seems the two run results are different.
Thank you.

Ray need both files to list sequences in the same order.

By default, most sequencing technologies do that by default, and the dominant sequencing technology is just like that too.

Thanks for the feedback !

-Sébastien

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News