Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwawrik
    replied
    Originally posted by seb567 View Post
    It says "Rank 43 reached 536870000 vertices from seed 0, flow 2"

    That is a lot of vertices !

    Which version of Ray are you using ?

    I believe that this issue was fixed in Ray 2.2.0. The ticket was https://github.com/sebhtml/ray/issues/161
    You can obtain the version of Ray with this command:
    mpiexec -n 1 Ray -version
    We were running v2.1.0. After installing 2.3.0, the data ran great, thank you. I also think you link was right on. Somehow it created an endless loop and we think that it ended up reaching the swap limit on one of our nodes, bouncing the job.

    Regardless, it works great now ! Thanks for the help !

    B

    Leave a comment:


  • seb567
    replied
    Originally posted by bwawrik View Post
    Hi,
    I've been using Ray very successfully on our cluster, but ran into a problem with my last data set. It's a relatively modest MySeq partial run of about 750k paired reads that originate from a pure bacterial strain. Originally, I ran the data and I got a decent assembly, but I discovered that there was some degree of read-through and that the adapters were not trimmed on the 3' end for all reads (not sure why illumina basespace does not check for that by default). I used Trimmomactic 0.3 to remove the adapters and tried re-assembling with Ray, but every time I do, it produces a crash. Basically, it goes into some sort of endless loop that ends up using 80 gig of swap at which point the cluster bounces my job. I sliced my data into three parts and assembled them separately or in combinations and the data assembles fine that way (so it's not corrupted data files). Only when I combine all three parts, do I get the crash. I added the stderr below. The stdout is >84gigs and I also added the end of the file.

    Any thoughts what the problem might be ?

    Thanks in advance !

    stderr:

    Ray:18680 terminated with signal 11 at PC=2aebbf8198d7 SP=7fff758b6340. Backtrace:
    /usr/mpi/intel/openmpi-1.4.3-qlc/lib64/libopen-pal.so.0(opal_memory_ptmalloc2_int_malloc+0xfd7)[0x2aebbf8198d7]
    /usr/mpi/intel/openmpi-1.4.3-qlc/lib64/libopen-pal.so.0(+0x49a05)[0x2aebbf817a05]
    /usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x2aebc014809d]
    /home/bstamps/Ray/Ray(_ZNSt6vectorI4KmerSaIS0_EE13_M_insert_auxEN9__gnu_cxx17__normal_iteratorIPS0_S2_EERKS0_+0x130)[0x473bc0]
    /home/bstamps/Ray/Ray(_ZN12SeedExtender28markCurrentVertexAsAssembledEP4KmerP13RingAllocatorPiP12StaticVectoriiP13ExtensionDataPbS9_S4_S9_PSt6vectorIS0_SaIS0_EEP7ChooserP10BubbleDataiP20OpenAssemblerChooseriPSA_I12AssemblySeedSaISK_EE+0x1371)[0x4e96e1]
    /home/bstamps/Ray/Ray(_ZN12SeedExtender29call_RAY_SLAVE_MODE_EXTENSIONEv+0x5f7)[0x4f16f7]
    /home/bstamps/Ray/Ray(_ZN11ComputeCore10runVanillaEv+0x9b)[0x534f7b]
    /home/bstamps/Ray/Ray(_ZN11ComputeCore3runEv+0x6e)[0x53638e]
    /home/bstamps/Ray/Ray(_ZN7Machine5startEv+0x1697)[0x4487a7]
    /home/bstamps/Ray/Ray(main+0x3a)[0x443bba]
    /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aebc0a67cdd]
    /home/bstamps/Ray/Ray[0x443a89]


    end of stdout:

    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536865000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 9921 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536866000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 12596 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536867000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 12235 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536868000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 13596 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536869000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 10610 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536870000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 12145 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Job /lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper /home/bstamps/Ray/Ray -k 31 -p /scratch/bwawrik/Colin/output_forward_paired2.fastq /scratch/bwawrik/Colin/output_reverse_paired2.fastq -o /scratch/bwawrik/Colin/ray_adapt_rem -minimum-contig-length 1000
    It says "Rank 43 reached 536870000 vertices from seed 0, flow 2"

    That is a lot of vertices !

    Which version of Ray are you using ?

    I believe that this issue was fixed in Ray 2.2.0. The ticket was https://github.com/sebhtml/ray/issues/161


    You can obtain the version of Ray with this command:


    mpiexec -n 1 Ray -version

    Leave a comment:


  • seb567
    replied
    [QUOTE=akshaya.ramesh;105505]
    Originally posted by seb567 View Post
    I fixed this build script.



    Thank you for fixing the buildscript. I have been able to make modifications to the makefile by altering the MAXKMERLENGTH and it is working alright. For some reason, every time I run Rayv2.2.0, the following error comes up (for any K-mer specification; I have tried values from 21-91; my reads are 101 bp in length):

    Rank 0: Assembler panic: no k-mers found in reads.
    Rank 0: Perhaps reads are shorter than the k-mer length (change -k).

    I am sure I am missing something very basic, but I am a newbie any comments would be greatly appreciated.

    Thanks,
    Akshaya
    How many reads do you have ? This information is available in the file

    RayOutput/NumberOfSequences.txt


    This is usually due to using k-mers that are longer than reads or to assembling 0 reads.

    Leave a comment:


  • seb567
    replied
    Originally posted by ercfrtz View Post
    I am having issues installing. When I run the make prefix=ray-build like the installation file says, I get the following error:

    Code:
    make prefix=ray-build
    make[1]: Entering directory `/home/eric.fritz/Ray-v2.2.0/RayPlatform'
      CXX RayPlatform/memory/ReusableMemoryStore.o
    make[1]: execvp: mpicxx: Not a directory
    make[1]: *** [memory/ReusableMemoryStore.o] Error 127
    make[1]: Leaving directory `/home/eric.fritz/Ray-v2.2.0/RayPlatform'
    make: *** [RayPlatform/libRayPlatform.a] Error 2
    I do have all the requirements on the system. Thanks.
    Can you provide the output of this command:

    type mpicxx

    Leave a comment:


  • seb567
    replied
    Originally posted by yifangt View Post
    More questions about the insert size:
    1) I checked the LibraryStatistics.txt, I found the number is quite different (303bp) from what my Lab tech provided (~1kb);
    2) Does this mean I have to run Ray first to get the insert size first then provide Ray with these parameters and run it again?
    I want to make sure each point, and Ray is really fast and it helps very much. Thanks a lot!
    YT
    Hi !

    You can use the data in the LibraryData.xml to check the actual distribution of your libraries and then maybe show this information to your laboratory technician. The 303 bp reported by Ray is an average for the detected peak in the data.

    For mates, sometimes you will have a low-frequency peak on the right (in your case at 1 kb).

    Thanks for the good words about Ray. It is appreciated !

    Leave a comment:


  • bwawrik
    replied
    problem with RAY assembly

    Hi,
    I've been using Ray very successfully on our cluster, but ran into a problem with my last data set. It's a relatively modest MySeq partial run of about 750k paired reads that originate from a pure bacterial strain. Originally, I ran the data and I got a decent assembly, but I discovered that there was some degree of read-through and that the adapters were not trimmed on the 3' end for all reads (not sure why illumina basespace does not check for that by default). I used Trimmomactic 0.3 to remove the adapters and tried re-assembling with Ray, but every time I do, it produces a crash. Basically, it goes into some sort of endless loop that ends up using 80 gig of swap at which point the cluster bounces my job. I sliced my data into three parts and assembled them separately or in combinations and the data assembles fine that way (so it's not corrupted data files). Only when I combine all three parts, do I get the crash. I added the stderr below. The stdout is >84gigs and I also added the end of the file.

    Any thoughts what the problem might be ?

    Thanks in advance !

    stderr:

    Ray:18680 terminated with signal 11 at PC=2aebbf8198d7 SP=7fff758b6340. Backtrace:
    /usr/mpi/intel/openmpi-1.4.3-qlc/lib64/libopen-pal.so.0(opal_memory_ptmalloc2_int_malloc+0xfd7)[0x2aebbf8198d7]
    /usr/mpi/intel/openmpi-1.4.3-qlc/lib64/libopen-pal.so.0(+0x49a05)[0x2aebbf817a05]
    /usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x2aebc014809d]
    /home/bstamps/Ray/Ray(_ZNSt6vectorI4KmerSaIS0_EE13_M_insert_auxEN9__gnu_cxx17__normal_iteratorIPS0_S2_EERKS0_+0x130)[0x473bc0]
    /home/bstamps/Ray/Ray(_ZN12SeedExtender28markCurrentVertexAsAssembledEP4KmerP13RingAllocatorPiP12StaticVectoriiP13ExtensionDataPbS9_S4_S9_PSt6vectorIS0_SaIS0_EEP7ChooserP10BubbleDataiP20OpenAssemblerChooseriPSA_I12AssemblySeedSaISK_EE+0x1371)[0x4e96e1]
    /home/bstamps/Ray/Ray(_ZN12SeedExtender29call_RAY_SLAVE_MODE_EXTENSIONEv+0x5f7)[0x4f16f7]
    /home/bstamps/Ray/Ray(_ZN11ComputeCore10runVanillaEv+0x9b)[0x534f7b]
    /home/bstamps/Ray/Ray(_ZN11ComputeCore3runEv+0x6e)[0x53638e]
    /home/bstamps/Ray/Ray(_ZN7Machine5startEv+0x1697)[0x4487a7]
    /home/bstamps/Ray/Ray(main+0x3a)[0x443bba]
    /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aebc0a67cdd]
    /home/bstamps/Ray/Ray[0x443a89]


    end of stdout:

    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536865000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 9921 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536866000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 12596 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536867000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 12235 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536868000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 13596 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536869000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 10610 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Rank 43 reached 536870000 vertices from seed 0, flow 2
    Speed RAY_SLAVE_MODE_EXTENSION 12145 units/second
    Rank 43: assembler memory usage: 24483328 KiB
    Job /lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper /home/bstamps/Ray/Ray -k 31 -p /scratch/bwawrik/Colin/output_forward_paired2.fastq /scratch/bwawrik/Colin/output_reverse_paired2.fastq -o /scratch/bwawrik/Colin/ray_adapt_rem -minimum-contig-length 1000

    Leave a comment:


  • akshaya.ramesh
    replied
    [QUOTE=seb567;103233]I fixed this build script.



    Thank you for fixing the buildscript. I have been able to make modifications to the makefile by altering the MAXKMERLENGTH and it is working alright. For some reason, every time I run Rayv2.2.0, the following error comes up (for any K-mer specification; I have tried values from 21-91; my reads are 101 bp in length):

    Rank 0: Assembler panic: no k-mers found in reads.
    Rank 0: Perhaps reads are shorter than the k-mer length (change -k).

    I am sure I am missing something very basic, but I am a newbie any comments would be greatly appreciated.

    Thanks,
    Akshaya

    Leave a comment:


  • ercfrtz
    replied
    I am having issues installing. When I run the make prefix=ray-build like the installation file says, I get the following error:

    Code:
    make prefix=ray-build
    make[1]: Entering directory `/home/eric.fritz/Ray-v2.2.0/RayPlatform'
      CXX RayPlatform/memory/ReusableMemoryStore.o
    make[1]: execvp: mpicxx: Not a directory
    make[1]: *** [memory/ReusableMemoryStore.o] Error 127
    make[1]: Leaving directory `/home/eric.fritz/Ray-v2.2.0/RayPlatform'
    make: *** [RayPlatform/libRayPlatform.a] Error 2
    I do have all the requirements on the system. Thanks.

    Leave a comment:


  • yifangt
    replied
    RE: Ray2.2.0

    More questions about the insert size:
    1) I checked the LibraryStatistics.txt, I found the number is quite different (303bp) from what my Lab tech provided (~1kb);
    2) Does this mean I have to run Ray first to get the insert size first then provide Ray with these parameters and run it again?
    I want to make sure each point, and Ray is really fast and it helps very much. Thanks a lot!
    YT
    Last edited by yifangt; 04-29-2013, 10:36 AM.

    Leave a comment:


  • seb567
    replied
    Originally posted by yifangt View Post
    Hi, Sebastian!
    Several questions while I am trying Ray2.2.0:
    1) When I tried to optimize the installation as suggested in the README.md
    Code:
    The best way to build Ray is to use whole-program optimization.
    With gcc, use this script:
    bash ./scripts/Build-Link-Time-Optimization.sh
    I could not use higher kmer > 32, even I made change of the line:
    Code:
      -D MAXKMERLENGTH=255 \
    The reason I want bigger maxkmer is my read length can be 250bp. Did I miss anything?
    I fixed this build script.




    2) As I got 2 new mate pair libraries for scaffolding, can I make use of the contigs I already have with Ray2.1.0 to combine them together to have "better/longer" scaffold as theoretically expected?
    Code:
    mpiexec -n 20 Ray -k 35 -p $INPATH/LAN4_35_clean_PE_R1.fasta $INPATH/LAN4_35_clean_PE_R2.fasta  -p $INPATH/LAN4_80_clean_PE_R1.fasta $INPATH/LAN4_80_clean_PE_R2.fasta -s $INPATH/S01/S01_071/Contigs.fasta  -o $OUTPATH/S01_035
    Actually when I run above settings, I could not get longer scaffold/contigs at all, and the contigs were broken! I was expecting the contigs should be bigger than the single read file -s $INPATH/S01/S01_071/Contigs.fasta What does this mean? Or, do I need to run both 3 libraries (1 PE, 2 MP libraries) together again from the beginning?
    Yes. you need to restart from the beginning.


    3) To follow my last post about the MP size, I do not have the standard deviation of the insert size, how do I handle that? As you mentioned
    HTML Code:
    Ray is usually pretty good at estimating your library sizes.
    Does that mean I do not need to provide the insert size for Ray? Thanks a lot!
    I think it does. You should check LibraryStatistics.txt regardless.

    Leave a comment:


  • yifangt
    replied
    Ray2.2.0

    Hi, Sebastian!
    Several questions while I am trying Ray2.2.0:
    1) When I tried to optimize the installation as suggested in the README.md
    Code:
    The best way to build Ray is to use whole-program optimization.
    With gcc, use this script:
    bash ./scripts/Build-Link-Time-Optimization.sh
    I could not use higher kmer > 32, even I made change of the line:
    Code:
      -D MAXKMERLENGTH=255 \
    The reason I want bigger maxkmer is my read length can be 250bp. Did I miss anything?

    2) As I got 2 new mate pair libraries for scaffolding, can I make use of the contigs I already have with Ray2.1.0 to combine them together to have "better/longer" scaffold as theoretically expected?
    Code:
    mpiexec -n 20 Ray -k 35 -p $INPATH/LAN4_35_clean_PE_R1.fasta $INPATH/LAN4_35_clean_PE_R2.fasta  -p $INPATH/LAN4_80_clean_PE_R1.fasta $INPATH/LAN4_80_clean_PE_R2.fasta -s $INPATH/S01/S01_071/Contigs.fasta  -o $OUTPATH/S01_035
    Actually when I run above settings, I could not get longer scaffold/contigs at all, and the contigs were broken! I was expecting the contigs should be bigger than the single read file -s $INPATH/S01/S01_071/Contigs.fasta What does this mean? Or, do I need to run both 3 libraries (1 PE, 2 MP libraries) together again from the beginning?

    3) To follow my last post about the MP size, I do not have the standard deviation of the insert size, how do I handle that? As you mentioned
    HTML Code:
    Ray is usually pretty good at estimating your library sizes.
    Does that mean I do not need to provide the insert size for Ray? Thanks a lot!
    Last edited by yifangt; 04-23-2013, 08:17 AM.

    Leave a comment:


  • seb567
    replied
    Ray v2.2.0 is now available.

    Hello,

    Ray v2.2.0 is now available worldwide.

    The delay between v2.1.0 and v2.2.0 was quite huge.

    Ray v2.2.0 brings a lot of bug fixes and some new features.

    The tarball is available at:







    The most significant changes include:

    * SequencesLoader: the Illumina export format is now supported
    * add build option for MPI I/O
    * void infinite loops during read recycling
    * messages must not be passed by value
    * Fixed a linking error caused by ordering
    * FusionTaskCreator: don't lose genomic regions during merging
    * new file GraphPartition.txt shows the distribution of objects
    * readahead operations are used for reading gz files
    * core: fixed a race condition occurring with -route-messages
    * SeedingData: fix regression for seed checkpointing
    * all the code of Ray was ported to this new GraphPath framework

    The GraphPath framework reduces the memory usage and avoid some misassembly
    errors by enforcing the Bruijn graph property.

    * Scaffolder: don't fetch reads from repeated objects

    This fixes running time issues on large genomes with repeats.

    * SeedingData: implemented a staggered mean algorithm

    * Mock: removed the limit on the number of input files
    * Library: implemented checkpointing for paired reads
    * removed all calls to fflush(stdout) and cout.flush()
    * SeedExtender: reduce the verbosity of graph traversal
    * reduced the amount of information in the standard output
    * JoinerTaskCreator: reduced the default verbosity
    * KmerAcademyBuilder: reduced the verbosity for graph construction
    * implemented an adaptive Bloom filter
    * store a path as a sequence instead of a vector of vertices for efficiency
    * SequencesLoader: add support for short file names





    All changes in Ray between v2.1.0 and v2.2.0

    Charles Joly Beauparlant (1):
    Added an example plugin.

    Sébastien Boisvert (160):
    Some work around the minirank model.
    Ported Ray plugins to the mini-ranks RayPlatform.
    Ray plugins were ported to the mini-ranks.
    Moved the destruction of allocators in RayPlatform.
    I ported Ray to some changes in some classes in RayPlatform.
    application_core: the application code was simplified
    Social networks were added to the release procedure
    Code names of old releases were added
    Fixed a linking error caused by ordering
    Fixed the scope of options in build system
    The build system was simplified
    AR and LD are not needed here
    Ray must abort if the output directory exists
    The RayCommand.txt file was fixed for mini-ranks
    Added the name of each rank (or mini-rank) in network test
    The subgraph must be built regardless if it will be used
    Merge branch 'minirank-model' of git://github.com/sebhtml/ray.git
    core: CONFIG_* variables are private
    core: The option -mini-rank-per-rank was added
    ship: removed 6 files in shipped products
    core: don't return parameters by value
    Mock: new plugin called that does nothing
    SequencesLoader: a regression for .bz2 file support was fixed
    messages must not be passed by value
    Ordered all headers
    Updated copyrights
    Documentation: there is only one repository for research tools
    reverted a wrong hunk from commit 7c361f1530d084c6f99
    FusionTaskCreator: don't lose genomic regions during merging
    SeedExtender: properly format extension file name
    Scaffolder: only put one new line after scaffold sequence
    KmerAcademyBuilder: use vertexRank() to find who owns an object
    new file GraphPartition.txt shows the distribution of objects
    the line that shows the process identifier was moved
    CoverageGatherer: kmers.txt should have 1 header only
    recursive make was improved
    readahead operations are used for reading gz files
    SequencesLoader: added the rank number when loading files
    core: the partitioner needs the correct rank number
    core: fixed a race condition occurring with -route-messages
    SeedExtender: display the number of traversed nucleotide symbols
    Seeds: new runtime metrics for seeding algorithms
    new header for SeedLengthDistribution.txt
    new header for any paired read file LibraryN.txt
    SequencesLoader: added a few assertions for read partitions
    new header for CoverageDistribution.txt
    Merge branch 'master' of github.com:sebhtml/ray
    Documentation: added the polytope with 4225 vertices
    SeedingData: fix regression for seed checkpointing
    added documentation for using the torus
    Documentation: added arguments for a 5D torus with 1024 vertices
    Documentation: fixed permissions
    removed the output file called MessagePassingInterface.txt
    renamed the AssemblySeed to GraphPath so it can be reused
    all the code of Ray was ported to this new GraphPath framework
    Documentation: fixed the degree of the polytope
    Scaffolder: don't fetch reads from repeated objects
    SeedExtender: added documentation in the code for repeated vertices
    fixed a couple of compilation warnings
    SeedingData: implemented a staggered mean algorithm
    Scaffolder: replaced getMode() by the new GraphPath framework
    Mock: removed the limit on the number of input files
    remove the limitation regarding the maximum number of files
    moved message handlers from MessageProcessor to SequencesLoader
    Scaffolder: fixed 2 compilation warnings
    Library: implemented checkpointing for paired reads
    SeedingData: reduced amount of printed information
    removed all calls to fflush(stdout) and cout.flush()
    SeedExtender: reduce the verbosity of graph traversal
    reduced the amount of information in the standard output
    JoinerTaskCreator: reduced the default verbosity
    KmerAcademyBuilder: reduced the verbosity for graph construction
    SequencesLoader: reduced verbosity
    VerticesExtractor: reduced verbosity
    reduced verbosity
    reduced verbosity
    SequencesLoader: the Illumina export format is now supported
    added a loader interface for file formats
    SequencesLoader: all supported formats use the interface
    SequencesLoader: implemented a product factory
    Mock: updated documentation for new export format
    Mock: output a single file for library data
    implemented an adaptive Bloom filter
    improved the interface of path objects
    add debug symbols by default
    store a path as a sequence instead of a vector of vertices for efficiency
    Mock: the path storage using blocks is not ready
    SeedingData: enforce de Bruijn graph property for path storage
    SeedingData: use the GraphPath storage code to compute seeds
    SeedingData: refactor code so that m_content is abstracted
    SeedingData: use 2-bit encoding for paths
    SeedingData: plugin options are parsed by plugins
    use constants for symbols
    SeedingData: correctly detect dead ends
    add more information for coding style
    MachineHelper: registerPlugin and resolveSymbols must be last
    SeedingData: tips can not be seeds
    SequencesLoader: add support for short file names
    SeedingData: tips are not valid seeds
    move some handlers in the Scaffolder plugin
    Scaffolder: implement the handler for packed chunks
    fix a race condition during directory probing
    reduce verbosity of components
    add documentation for building on IBM Blue Gene/Q
    add code name for upcoming release
    SequencesLoader: fix regression (added in ca979832) for line widths
    add plugin PathEvaluator to evaluate paths
    PathEvaluator: write ContigPaths checkpoints in parallel
    reserve storage capacity for sequence file
    perform parallel I/O operations
    fix a bug when disabling scaffolding
    use MPI I/O to write Contigs.fasta
    use a file view for each MPI rank
    add build option for MPI I/O
    avoid parallel I/O without MPI I/O
    avoid infinite loops during read recycling
    update polytope documentation
    add comments for old class
    add a new plugin to process spurious seeds
    port some plugins to the simplified RayPlatform API
    iterate on seeds to filter them
    register seed paths in the distributed graph
    hide hash values for Bloom filter
    push the workflow in a helper class
    fetch ancestors of seed heads
    seed lengths must be collected after analysis
    write seed statistics after analysis
    write seed checkpoints after the quality control analysis
    write seed files after analysis (-write-seeds)
    skip seed quality analysis if checkpoints exist
    add steps for better dead end detection
    hide mini-ranks in help if they are disabled
    correct a bunch of bugs for adapters in Ray
    reuse code paths to obtain sequence information
    eliminate seeds that have a dead-end on the left
    discard seeds with dead-ends on the right
    increase the maximum depth for searches
    add a class to fetch the attributes of a DNA sequence
    create a class to fetch annotations in a portable way
    fetch nearby paths to detect bubbles
    fix a bug during the registration of seeds
    remove any seed that is a weak part of a bubble
    add 4 methods that will be implemented later
    fix a regression that prevented the closing of a file
    add new reference in the output
    disable the seed filter when using short kmers
    add a maximum coverage depth for dead end search
    adapt the allowed depth in function of the data
    add design blueprints for the new plugin
    SpuriousSeedAnnihilator: disable debug messages by default
    TaxonomyViewer: rename the plugin to TaxonomyViewer
    remove plugin_ from all plugin directory names
    add new line for publications
    application_core: fix buggy message routing
    SeedExtender: don't traverse path if it's consumed already
    SeedingData: fix a bug for the phix system test
    update the CMakeList.txt
    use git to store version names
    Disable the filtering code during the computation of seeds
    This is Ray v2.2.0




    All changes in RayPlatform between v1.1.0 and v1.1.1


    Sébastien Boisvert (56):
    initial work on miniranks with VirtualMachine and Minirank
    I added some design documentation for mini-ranks.
    spinlocks are more suitable for this job
    added design documentation for mini-ranks.
    First implementation of mini-ranks in RayPlatform
    The core must provide the mini-rank number.
    Documentation: added description of macros.
    Fixed some bugs in the mini-ranks model.
    Moved the destruction of allocators in the core.
    Mini-rank source and mini-rank destination are required.
    The desctructor of the middleware must be called.
    A mini-rank must tell the rank that it has messages to send.
    The class MessageQueue does the job of receiving messages.
    Non-blocking queues will be used for the communication.
    The non-blocking message queue for mini-ranks is ready.
    MPI_Recv must be called to get the mini-rank numbers.
    This is the branch for RayPlatform v7.0.0.
    core: The old behavior (no mini-ranks) now works as expected
    core: RayPlatform is responsible for creating mini-ranks
    The old adapter API documentation was removed
    Message reception is now interleaved with send operations.
    More buffers are needed for mini-ranks
    communication: don't register already registered buffers
    The build system is less verbose
    New API call to get the number of mini-ranks per rank
    Added a method to get the MessagesHandler object
    Merge branch 'minirank-model' of github.com:sebhtml/RayPlatform into minirank-model
    Merge branch 'minirank-model' of git://github.com/sebhtml/RayPlatform.git
    handlers: new option to cache operation codes
    communication: messages must be passed with a pointer
    Ordered headers in all files
    Updated copyrights
    The short name was updated in headers
    The website was updated in every file
    a retry is necessary when a message is pushed into a full ring
    Documentation: updated RayPlatform mini-ranks blueprints
    communication: moved writeFiles() in a second method
    communication: removed a few debugging instructions
    Documentation: added gate blueprints
    Documentation: improved design for non-linear scheduling
    routing: renamed the hypercube to polytope
    Documentation: added Torus description
    a radix of 2 produces a hypercube
    use the Q and ASSERT build arguments in RayPlatform
    routing: implemented a new communication graph: the torus
    Merge branch 'master' of github.com:sebhtml/RayPlatform
    core: use specific code to get memory usage on Blue Gene/Q
    the next release will likely be 1.2.0 and not 7.0.0
    add option to provide public access to a master mode
    add the core in each plugin
    add two macros to configure handlers
    fixed directives to compile mini-ranks
    core: fix buggy message routing
    improve the patch for message routing with a configuration
    core: fix a regression for registered handle names
    This is RayPlatform v1.1.0.

    Leave a comment:


  • seb567
    replied
    Originally posted by yifangt View Post
    Hi Sebastien:
    What is the option to include the different mate-pairs information for assembly?
    After searching this forum I handled this mixture of reads by pretending there are multiple paired end reads as:
    Code:
    mpiexec -n 20 Ray -k 53 -p S01_clean_PE_R1.fasta S01_clean_PE_R2.fasta -p S01_MP1_R1.fasta S01_MP1_R2.fasta -p S01_MP2_R1.fasta S01_MP2_R2.fasta -o S01_53_PE_MP
    But the fact is PE read is the paired end, MP1 reads is for 3_5kb mate-pair and MP2 is 8~10kb mate-pair.
    How to include the mate pair distance for Ray, if there is a way? Thanks!
    YT
    Ray is usually pretty good at estimating your library sizes.

    You can provide the information manually should you wish to do so.

    Code:
    mpiexec -n 99 Ray -p mate_1.fastq mate_2.fastq 8000 800
    In the example above, 8000 is the average outer distance (distance between reads + read lengths) and 800 is the standard deviation on that quantity.

    Leave a comment:


  • yifangt
    replied
    How to include the mate-pair information in Ray

    Hi Sebastien:
    What is the option to include the different mate-pairs information for assembly?
    After searching this forum I handled this mixture of reads by pretending there are multiple paired end reads as:
    Code:
    mpiexec -n 20 Ray -k 53 -p S01_clean_PE_R1.fasta S01_clean_PE_R2.fasta -p S01_MP1_R1.fasta S01_MP1_R2.fasta -p S01_MP2_R1.fasta S01_MP2_R2.fasta -o S01_53_PE_MP
    But the fact is PE read is the paired end, MP1 reads is for 3_5kb mate-pair and MP2 is 8~10kb mate-pair.
    How to include the mate pair distance for Ray, if there is a way? Thanks!
    YT

    Leave a comment:


  • seb567
    replied
    Originally posted by yifangt View Post
    Another question!
    I want to view my assembly with other software e.g. Tablets, Mauve or OSlay, etc. Is there any way in Ray to convert the output files to ACE, MAQ, SAM or BAM format for those post-assembly programs?
    The two current options are:

    1. use -amos, then use a amos-compatible viewer

    2. use -write-kmers, then use Ray Cloud Browser

    Originally posted by yifangt View Post

    In the FAQ section of your site there is question about the AMOS format for the output, but I did not do that. Do I have to run the assembly again and have the -amos option on?

    Yes, you need to run it again.

    Originally posted by yifangt View Post

    But unfortunately the AMOS format is not universal for other programs to read.

    There are two formats for de novo assemblies: amos and fastg. The amos format is supported by far more applications.

    Originally posted by yifangt View Post

    I was trying to figure out what the output files are about, but not sure which one I should use for those visualization programs, or which one should be used for perl/shell script for the format converstion.
    There are Contigs.fasta and Scaffolds.fasta.

    One thing you can do is to map your fastq sequences on the contigs and use, for example, "samtools tview" to visualize that.


    Another way is to run Ray with -write-kmers and to use Ray Cloud Browser, which is probably the most-interactive web genome viewer you'll find out there.

    Originally posted by yifangt View Post

    Appreciate if you could give me any clue.

    Thanks!

    YT

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:20 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
36 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
39 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
41 views
0 likes
Last Post seqadmin  
Working...
X