Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    It might be possible to shoehorn Ray into doing something like the 'Inchworm' part of Trinity:

    Compare the best free open source Bio-Informatics Software at SourceForge. Free, secure and fast Bio-Informatics Software downloads from the largest Open Source applications and software directory


    I've had a bit of a hiatus from work on Ray due to additional projects, but I'm interested in seeing if this will work because the current transcriptome assembly programs have really high memory requirements. The memory requirements are odd because the transcript graphs should be simpler (fewer repeats because you're making things like proteins, so branches should be mostly due to different isoforms), and the transcriptome size is smaller than the genome size.

    My guess is trying something like disabling the genome coverage graph functions -- with RNASeq the mean coverage is per-transcript, but there can be within-transcript bias -- and writing out sequences that have some minimum coverage level based on the average coverage for each disconnected graph.

    Comment

    • santiagosnchez
      Junior Member
      • Feb 2012
      • 4

      Ray error message: Fatal error

      Hi Sébastien,

      I've been using Ray to assemble a 30-50 Mb fungal genome from 454 and PE Illumina reads. When I was testing the software with raw reads I had no trouble en the assembly carried on correctly. The problem arose when I quality filtered all the reads and created a new fasta and fastq files. I´m pasting the error message here:

      What could the problem be?

      Cheers,
      Santiago

      Rank 5: gathering scaffold links [1/3559] [1/28971]
      Rank 2: gathering scaffold links [1/3854] [1/30494]
      Rank 4: gathering scaffold links [1/3726] [1/56682]
      Fatal Error: ReadIndex: 18854336 but Reads: 18635750
      Ray: code/communication/MessageProcessor.cpp:127: void MessageProcessor::call_RAY_MPI_TAG_GET_READ_MARKERS(Message*): Assertion `readId<(int)m_myReads->size()' failed.
      [ipara:21878] *** Process received signal ***
      [ipara:21878] Signal: Aborted (6)
      [ipara:21878] Signal code: (-6)
      [ipara:21878] [ 0] /lib/libpthread.so.0 [0x7ff0190d3a80]
      [ipara:21878] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7ff018da3ed5]
      [ipara:21878] [ 2] /lib/libc.so.6(abort+0x183) [0x7ff018da53f3]
      [ipara:21878] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7ff018d9cdc9]
      [ipara:21878] [ 4] Ray(_ZN16MessageProcessor33call_RAY_MPI_TAG_GET_READ_MARKERSEP7Message+0x454) [0x43fa74]
      [ipara:21878] [ 5] Ray(_ZN7Machine10runVanillaEv+0x99) [0x454f19]
      [ipara:21878] [ 6] Ray(_ZN7Machine5startEv+0x1031) [0x456c51]
      [ipara:21878] [ 7] Ray(main+0x3c) [0x4c0abc]
      [ipara:21878] [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x7ff018d901a6]
      [ipara:21878] [ 9] Ray(__gxx_personality_v0+0x201) [0x42cd09]
      [ipara:21878] *** End of error message ***
      mpiexec noticed that job rank 0 with PID 21872 on node ipara exited on signal 15 (Terminated).
      6 additional processes aborted (not shown)

      Comment

      • seb567
        Senior Member
        • Jul 2008
        • 260

        Originally posted by gringer View Post
        It might be possible to shoehorn Ray into doing something like the 'Inchworm' part of Trinity:

        Compare the best free open source Bio-Informatics Software at SourceForge. Free, secure and fast Bio-Informatics Software downloads from the largest Open Source applications and software directory


        I've had a bit of a hiatus from work on Ray due to additional projects, but I'm interested in seeing if this will work because the current transcriptome assembly programs have really high memory requirements. The memory requirements are odd because the transcript graphs should be simpler (fewer repeats because you're making things like proteins, so branches should be mostly due to different isoforms), and the transcriptome size is smaller than the genome size.

        My guess is trying something like disabling the genome coverage graph functions -- with RNASeq the mean coverage is per-transcript, but there can be within-transcript bias -- and writing out sequences that have some minimum coverage level based on the average coverage for each disconnected graph.
        Hello,

        I don't think we can assume that each transcript will be a disconnected-from-the-rest component in the graph.

        Also, I think you should work with the mode k-mer coverage, not the mean k-mer coverage because the mean will be artificially increased by repeats.

        We tested Ray on the Schizosaccharomyces pombe dataset from the Trinity paper.

        Ray is quite good but presently we are focusing on assembly of metagenomes and biological abundances using virtual colors.


        Sébastien

        Comment

        • seb567
          Senior Member
          • Jul 2008
          • 260

          Originally posted by santiagosnchez View Post
          Hi Sébastien,

          I've been using Ray to assemble a 30-50 Mb fungal genome from 454 and PE Illumina reads. When I was testing the software with raw reads I had no trouble en the assembly carried on correctly. The problem arose when I quality filtered all the reads and created a new fasta and fastq files. I´m pasting the error message here:

          What could the problem be?

          Cheers,
          Santiago

          Rank 5: gathering scaffold links [1/3559] [1/28971]
          Rank 2: gathering scaffold links [1/3854] [1/30494]
          Rank 4: gathering scaffold links [1/3726] [1/56682]
          Fatal Error: ReadIndex: 18854336 but Reads: 18635750
          Ray: code/communication/MessageProcessor.cpp:127: void MessageProcessor::call_RAY_MPI_TAG_GET_READ_MARKERS(Message*): Assertion `readId<(int)m_myReads->size()' failed.
          [ipara:21878] *** Process received signal ***
          [ipara:21878] Signal: Aborted (6)
          [ipara:21878] Signal c areode: (-6)
          [ipara:21878] [ 0] /lib/libpthread.so.0 [0x7ff0190d3a80]
          [ipara:21878] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7ff018da3ed5]
          [ipara:21878] [ 2] /lib/libc.so.6(abort+0x183) [0x7ff018da53f3]
          [ipara:21878] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7ff018d9cdc9]
          [ipara:21878] [ 4] Ray(_ZN16MessageProcessor33call_RAY_MPI_TAG_GET_READ_MARKERSEP7Message+0x454) [0x43fa74]
          [ipara:21878] [ 5] Ray(_ZN7Machine10runVanillaEv+0x99) [0x454f19]
          [ipara:21878] [ 6] Ray(_ZN7Machine5startEv+0x1031) [0x456c51]
          [ipara:21878] [ 7] Ray(main+0x3c) [0x4c0abc]
          [ipara:21878] [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x7ff018d901a6]
          [ipara:21878] [ 9] Ray(__gxx_personality_v0+0x201) [0x42cd09]
          [ipara:21878] *** End of error message ***
          mpiexec noticed that job rank 0 with PID 21872 on node ipara exited on signal 15 (Terminated).
          6 additional processes aborted (not shown)
          Paired reads are stored in two files usually. For any pair of files, each file of the pair must have the same sequence count.

          I suspect that the resulting fastq files you generated (after filtering) don't have a coherent number of sequences.

          This is due to the fact that for any pair of sequences, 0, 1 or 2 sequences can be filtered out. In the 0 and 2 cases, there is no problem because it is a 'remove all' or a 'keep all' scenario.

          But when only 1 sequence is filtered out, its twin should also be filtered out or perhaps put aside in a file containing 'alone' sequences.

          The problem arises because Ray utilises Unique Sequencer Identifier, which are computed from the initial partition (fastq identifiers are not utilised at all).

          The problem will go away should you provide Ray with a coherent sequence count for each file.


          Sébastien

          Comment

          • santiagosnchez
            Junior Member
            • Feb 2012
            • 4

            Thanks for replying Sebastien,

            I figured out the problem right after my post. Do you recommend a way to exclude / delete unpaired filtered reads from each file? I've been trying to find some scripts, but no luck.

            By the way, excellent program(!), by far the best assembler I've used.

            Cheers,
            Santiago

            Comment

            • jtladner
              Junior Member
              • Feb 2010
              • 3

              Ray - Coverage too high

              Hello, I have been using Ray for the de novo synthesis of several bacterial genomes. Overall it seems to be a really good program that has been giving me longer contigs that SOAPdenovo.

              However, recently I ran into an error that seems to be due to genome coverage that is too high:

              Rank 0: the minimum coverage is 2
              Rank 0: the peak coverage is 2
              Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution.
              Rank 0: to deal with the sequencing error rate, try to lower the k-mer length (-k)

              At first I thought that I had the opposite problem, not enough coverage. I tried to lower the k as suggested, but I kept getting the same error. The only way I have been able to get Ray to run on this dataset is too either decrease the number of sequences that I am inputting into the program (in which case I get very good contigs) or increasing the k-mer to very high numbers (e.g., 63).

              If possible, could you explain why high coverage would result in this type of error?

              And can you provide guidelines for the optimal genome coverage for Ray?


              Thank you.

              Jason

              Comment

              • santiagosnchez
                Junior Member
                • Feb 2012
                • 4

                Sébastien,

                Is there a way to reuse some of Ray's output files in order to avoid some of the initial computations on the same data?

                Cheers,
                Santiago

                Comment

                • seb567
                  Senior Member
                  • Jul 2008
                  • 260

                  Originally posted by santiagosnchez View Post
                  Thanks for replying Sebastien,

                  I figured out the problem right after my post. Do you recommend a way to exclude / delete unpaired filtered reads from each file? I've been trying to find some scripts, but no luck.

                  By the way, excellent program(!), by far the best assembler I've used.

                  Cheers,
                  Santiago
                  I don't know any particularly good program for this precise task.

                  Comment

                  • seb567
                    Senior Member
                    • Jul 2008
                    • 260

                    Originally posted by jtladner View Post
                    Hello, I have been using Ray for the de novo synthesis of several bacterial genomes. Overall it seems to be a really good program that has been giving me longer contigs that SOAPdenovo.


                    However, recently I ran into an error that seems to be due to genome coverage that is too high:

                    Rank 0: the minimum coverage is 2
                    Rank 0: the peak coverage is 2
                    Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution.
                    Rank 0: to deal with the sequencing error rate, try to lower the k-mer length (-k)
                    This limitation was removed in the Release of Ray 2.0-Release Candidate 5.

                    You can try Ray 2.0-rc5.

                    We modified this to enable metagenome assemblies.


                    Originally posted by jtladner View Post


                    At first I thought that I had the opposite problem, not enough coverage. I tried to lower the k as suggested, but I kept getting the same error. The only way I have been able to get Ray to run on this dataset is too either decrease the number of sequences that I am inputting into the program (in which case I get very good contigs) or increasing the k-mer to very high numbers (e.g., 63).

                    If you plot the coverage distribution, I am sure you will see something thatg is not smooth, yet I am sure you will see a sizable peak.

                    To plot your data (enter these commands in your terminal)


                    Code:
                    cd Place-Where-My-Assembly-Is-Located
                    ls CoverateDistribution.txt # make sure you are at the good place
                    R --vanilla
                    
                    # the next commands will be given to R
                    data=read.table('CoverageDistribution.txt',header=TRUE)
                    pdf('MyCoverageFrequencies.pdf')
                    plot(data[,1],data[,2],xlab='k-mer coverage depth',ylab='Frequency',log='xy',type='l')
                    dev.off()
                    There is also a fancy script that ships with Ray that does that automatically.

                    Code:
                    ~/git-clones/ray/scripts/plot-coverage-distribution.R CoverageDistribution.txt

                    Originally posted by jtladner View Post


                    If possible, could you explain why high coverage would result in this type of error?
                    We bought an Illumina HiSeq 1000 at our institution.

                    One of the acceptation tests was to do a whole lane of PhiX, a virus whose genome has just 5386 nucleotides.


                    The coverage distribution was ridiculous:




                    If we zoom in, we can see that the peak is not smooth.





                    This *may* be caused be cluster complexity on the flow cell.

                    *Maybe* your data look like this also, maybe not.


                    Originally posted by jtladner View Post

                    And can you provide guidelines for the optimal genome coverage for Ray?
                    As the saying goes, "the more, the better."

                    You should plot your distributions to assess the quality of your data.


                    Originally posted by jtladner View Post

                    Thank you.

                    Jason

                    Comment

                    • seb567
                      Senior Member
                      • Jul 2008
                      • 260

                      Greetings !

                      Originally posted by santiagosnchez View Post
                      Sébastien,

                      Is there a way to reuse some of Ray's output files in order to avoid some of the initial computations on the same data?

                      Cheers,
                      Santiago

                      Yes, they are called checkpoints.

                      You just have to add -read-write-checkpoints

                      However, note that checkpointing files (they are binary and have the .ray extension) are only valid with the same command using the same data with the same number of MPI rank.

                      This mechanism is a checkpointing facility.



                      HTML Code:
                      mpiexec -n 1 Ray -help | less
                      
                        Checkpointing
                      
                             -write-checkpoints
                                    Write checkpoint files
                      
                             -read-checkpoints
                                    Read checkpoint files
                      
                             -read-write-checkpoints
                                    Read and write checkpoint files
                      
                      
                      
                      
                      

                      Comment

                      • santiagosnchez
                        Junior Member
                        • Feb 2012
                        • 4

                        So this could be achieved by typing something like:

                        mpiexec -n <#> Ray -o <$$$$> -read-checkpoints
                        (after you did a run with -write-checkpoints)

                        Is it possible to change the k-mer size for instance?

                        Thanks,

                        Santiago

                        Comment

                        • Anelda
                          Member
                          • May 2010
                          • 30

                          RAY on colourspace

                          Hi there,

                          Do you have any news on the colourspace issue? We ran RAY today for the first time and was very impressed, except that we mostly deal with SOLiD data and would need the contigs in base space eventually :-)

                          Thanks!

                          Anelda

                          Comment

                          • steph
                            Junior Member
                            • Dec 2010
                            • 2

                            Problem at compilation with latest GCC version

                            Hi everyone,

                            I encountered a problem when trying to build the latest stable version of Ray (1.7) with the latest version of GCC (v4.7.0).

                            The problem occured at the make step.

                            With GCC v4.7.0, I got the following errors:

                            Code:
                            code/communication/MessageProcessor.cpp: In member function 'void MessageProcessor::call_RAY_MPI_TAG_ASK_VERTEX_PATH(Message*)':
                            code/communication/MessageProcessor.cpp:1685:7: error: redeclaration of 'int i'
                            code/communication/MessageProcessor.cpp:1675:10: error: 'int i' previously declared here
                            make: *** [code/communication/MessageProcessor.o] Error
                            However, when I used GCC v4.1.2 (which was also installed on this machine) instead, the installation finished correctly.

                            Comment

                            • gringer
                              David Eccles (gringer)
                              • May 2011
                              • 845

                              That's because the more recent versions of GCC do more code checking. Redeclaring variables introduces some scoping issues, and usually means that the coder hasn't realised there's an ambiguity. Luckily, these redeclaration errors are usually easily fixed, for example by changing the name of the inner loop variable to j instead of i.

                              Comment

                              • seb567
                                Senior Member
                                • Jul 2008
                                • 260

                                Originally posted by santiagosnchez View Post
                                So this could be achieved by typing something like:

                                mpiexec -n <#> Ray -o <$$$$> -read-checkpoints
                                (after you did a run with -write-checkpoints)

                                Is it possible to change the k-mer size for instance?

                                Thanks,

                                Santiago
                                No, you can not change the k-mer size if you use the same checkpointing files.

                                There is the option -read-write-checkpoints that read and write these checkpoints too.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...