Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jazz710
    replied
    Can you explain Ray Surveyor in a bit more detail? I'm having a hard time understanding the documentation but I think this could be of use to me.

    Leave a comment:


  • shayan shams
    replied
    Hi Folks,
    I have serious problem with Ray and open mpi
    I am using a cluster with 4 nodes each has 8 cores and surprisingly when I run ray on single node with mpirun -np 8 it takes shorter time than I use two nodes and so on for example for one node it takes 5mins and for two nodes mpirun -np16 it taks 8min and for 3 nodes mpirun -24 it takes 12 mins and so on can any body please help me to find out the problem

    Leave a comment:


  • bastianwur
    replied
    Contigs, or scaffolds?
    Have you tried giving the a possible distance for the reads to the assembler?

    Leave a comment:


  • hbn
    replied
    Maybe this question was already asked somewhere, but I can not find it:

    Is there a way to set the maximum insert size for paired end assembly with Ray? If not, what is the maximum insert size considered?

    I have an assembly which uses both normal insert size Illumina reads ( ~ 250 bp) and some longer insert sizes ( ~ 500 bp). When adding this last library, the results do not improve, which I think is suspicious.. Any ideas?

    Leave a comment:


  • Zapages
    replied
    I am trying to assemble 275 paired end Illumina reads that I have interleaved together. Previously I was successfuly ran the interleaved files at the Kmer value 137. I compiled latest Ray version at Max Kmer size of 600 (technically 599).

    That code was:


    Code:
    mpiexec -n 30 Ray -k 137 -i interleaved.fastq -o Ray_K137

    Now if I try a smaller Kmer value, I am running into a weird error Chunk Size error.

    I have tried:

    Code:
    mpiexec -n 10 Ray -k 51 -i interleaved.fastq -o Ray_K51_try3
    Code:
    mpiexec -n 30 Ray -k 51 -i interleaved.fastq -o Ray_K51_try3
    All these have caused the same Chunk Size error. I even tried it without mpiexec enabled. I still was retruned with the error below.

    Code:
    Rank 0 : VirtualCommunicator (service provided by VirtualCommunicator): 2957916 virtual messages generated 115295 real messages (3.89785%)
    Rank 0 freed 549453824 bytes from the path memory pool (chunks: 131)
    Rank 0: gossiping generated 0 messages (gossips: 0 ---> 0)
    Critical exception: The length of the requested memory exceeds the CHUNK_SIZE: 36423920 > 33554432
    Ray: RayPlatform/memory/MyAllocator.cpp:97: void* MyAllocator::allocate(int): Assertion `false' failed.
    [BioLinux301:05209] *** Process received signal ***
    [BioLinux301:05209] Signal: Aborted (6)
    [BioLinux301:05209] Signal code:  (-6)
    [BioLinux301:05209] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f4a34d27340]
    [BioLinux301:05209] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f4a34987bb9]
    [BioLinux301:05209] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f4a3498afc8]
    [BioLinux301:05209] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2fa76) [0x7f4a34980a76]
    [BioLinux301:05209] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2fb22) [0x7f4a34980b22]
    [BioLinux301:05209] [ 5] Ray() [0x533b50]
    [BioLinux301:05209] [ 6] Ray() [0x4f7552]
    [BioLinux301:05209] [ 7] Ray() [0x551768]
    [BioLinux301:05209] [ 8] Ray() [0x5550ab]
    [BioLinux301:05209] [ 9] Ray() [0x5562ea]
    [BioLinux301:05209] [10] Ray() [0x413379]
    [BioLinux301:05209] [11] Ray() [0x40c5bf]
    [BioLinux301:05209] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f4a34972ec5]
    [BioLinux301:05209] [13] Ray() [0x40e0cf]
    [BioLinux301:05209] *** End of error message ***
    zsh: abort      Ray -k 51 -i interleaved.fastq -o Ray_K51_try3
    I am running this on BioLinux 8 Workstation that 32 threads and specs are: Intel Xeon E5-2640v2 2 Ghz with 128 GB of RAM.

    Really appreciate on how to proceed.
    Last edited by Zapages; 05-08-2015, 04:22 AM.

    Leave a comment:


  • canderson30
    replied
    Hello,

    When I look through some outputs generated from the amos file following assembly, many of the contigs were assigned 0 reads (used default bank2contig after seeing many contigs were not showing up in the generated sam file). Obviously, this does not make much sense, but I was wondering if anyone else has came across this? I was trying to avoid mapping by using the amos file and now I just want to confirm that the contigs I am getting are 'real' I suppose.

    I thought this may be due to read recycling at first, but reads show up under multiple contigs still. Anyone have other ideas what is causing this issue or how to correct it during assembly?


    Chris

    Leave a comment:


  • seb567
    replied
    Ray 2.3.1

    Hi,

    Ray 2.3.1 is now available on http://denovoassembler.sourceforge.net/download.html.

    Significant changes:

    * This version includes "Surveyor" to compute similarity (or distance) matrices
    for hundreds or possibli thousands of samples.
    * fix compilation error on Apple OS X Mavericks
    * fix infinite loop when running on 2 CPU cores
    * fix a bug when the number of ranks is a prime number



    All changes in Ray:

    Rob Egan (1):
    fix compilation on NERSC's edison machine using PrgEnv-intel

    Sébastien Boisvert (30):
    SequencesLoader: fix bad automatic pairing of sequence files
    SequencesLoader: fix compilation warnings
    Surveyor: verify buffer size before getting producer
    Surveyor: add a variable to store the period
    Surveyor: run in actor-model-only mode
    spawn actors with spawn instead of spawnActor
    Documentation: add some documentation for Surveyor
    SeedExtender: add some assertions
    Searcher: disable verbose outputs
    Surveyor: skip invalid files
    coloring: added comments for coloring subsystem
    update release procedure
    next release will be 2.3.1
    fix infinite loop when running on 2 CPU cores
    fix a bug when the number of ranks is a prime number
    print number of payloads
    add some code to test directed surveys with Surveyor
    fix reproducibility issue for similarity and distance matrices
    Surveyor: support nucleotides in lower case
    report invalid edges as warnings instead of errors
    documentation: add license in README
    Surveyor: report 0 hits when necessary
    SeedingData: provide prototypes for friend functions
    Surveyor: fix compilation issue without debug code
    seeds: add a parameter -minimum-seed-length (default 100)
    add option -graph-only to stop after graph building
    fix compilation error on Apple OS X Mavericks
    use CONFIG_ASSERT instead of ASSERT for optional code
    version 2.3.1
    update releases


    Changes in RayPlatform:

    Rob Egan (1):
    fix compilation on NERSC's edison machine using PrgEnv-intel

    Sébastien Boisvert (15):
    communication: relay buffer bytes instead of buffer 64-bit integers
    core: add a actor-model-only mode
    actors: add playground status with -debug
    core: add buffer statistics with -debug
    actor model: change the method name from spawnActor to spawn
    fix the code for testing message integrity
    fix a regression introduced in a01f97eae41bcd759bfc521d84053552cf38d521
    files: add method to check if a file is valid
    add mini-rank information in the message metadata
    fix mini-rank runtime engine
    print registered message tags in debug mode
    documentation: add LGPLv3 info in README
    communication: some routes don't require routing
    use CONFIG_ASSERT instead of ASSERT for optional code
    fix compilation warning

    Leave a comment:


  • bossanova352
    replied
    Originally posted by seb567 View Post
    Short seeds mean that they are not connecting with one another.

    Can you provide a couple of lines from CoverageDistribution.txt (head) ?
    Sure! It does seem to be working now, I'm getting contigs and scaffolds in my output files.

    # KmerCoverage Frequency
    # Any frequency is a even number because of odd k-mer length
    2 158870850
    3 43942818
    4 18999600
    5 10198722
    6 6257290
    7 4165874
    8 2937460
    9 2155282

    Leave a comment:


  • seb567
    replied
    Originally posted by bossanova352 View Post
    How silly of me! Well I changed the formatting, but unfortunately I'm still not getting any output from Ray. This is what the file looks like now (all sequences on one line):



    Again, it looks like this step has some clues as to what is going on:



    Fixed! It was another formatting issue, (^M characters were showing up after the one-line formatting). Thanks, Seb! I appreciate the help.
    Short seeds mean that they are not connecting with one another.

    Can you provide a couple of lines from CoverageDistribution.txt (head) ?

    Leave a comment:


  • bossanova352
    replied
    Originally posted by seb567 View Post
    Each sequence needs to be on one line (this is a current limitation of fasta support in Ray).

    That's presumably the issue.
    How silly of me! Well I changed the formatting, but unfortunately I'm still not getting any output from Ray. This is what the file looks like now (all sequences on one line):

    >DB775P1:2451TDYACXX:2:1101:1582:1958_1:N:0:CGTACTAG
    AGTTCTGCAAAGACATCATCCAAAATTAGAATGGGTTCTTGTTTACGACGGGATGTATCA
    >DB775P1:2451TDYACXX:2:1101:1582:1958_2:N:0:CGTACTAG
    CCTGGTCAATGGCGATTTCACTACGCATTGGATCTTTTAATTATGCCAGCCACGGTGAAT
    >DB775P1:2451TDYACXX:2:1101:1853:1966_1:N:0:CGTACTAG
    GAGGACCATCCAGGAGTGCATTAAAATAGCCGGCTGAGGAAGTCGATCCTTGAAAGAGGT
    >DB775P1:2451TDYACXX:2:1101:1853:1966_2:N:0:CGTACTAG
    GTCAAGAATGCCATCCGAGCTGCGATGACCAATATCGAGCAAAGTAGCGATGCCCGCGCT
    >DB775P1:2451TDYACXX:2:1101:2768:1957_1:N:0:CGTACTAG
    GTTGGAGCGCTTGGTATCCTGCGCTCCAATATTCATCACAGTGGGAATGACGCCCCCTAC
    Again, it looks like this step has some clues as to what is going on:

    Rank 2 has 12675 seeds
    Rank 2 is creating seeds [2985784/2985784] (completed)
    Rank 2: peak number of workers: 2002, maximum: 32768
    Rank 2 : VirtualCommunicator (service provided by VirtualCommunicator): 19245610
    Rank 2 runtime statistics for seeding algorithm:
    Rank 2 Skipped paths because of dead end for head: 0
    Rank 2 Skipped paths because of dead end for tail: 0
    Rank 2 Skipped paths because of two dead ends: 0
    Rank 2 Skipped paths because of bubble weak component: 0
    Rank 2 Skipped paths because of short length: 2960369
    Rank 2 Skipped paths because of bad ownership: 12740
    Rank 2 Skipped paths because of low coverage: 0
    Rank 2 Eligible paths: 12675
    Rank 2: assembler memory usage: 263224 KiB
    Fixed! It was another formatting issue, (^M characters were showing up after the one-line formatting). Thanks, Seb! I appreciate the help.
    Last edited by bossanova352; 01-14-2014, 11:34 AM.

    Leave a comment:


  • seb567
    replied
    Originally posted by bossanova352 View Post
    Yeah, here it is:
    Each sequence needs to be on one line (this is a current limitation of fasta support in Ray).

    That's presumably the issue.

    Leave a comment:


  • bossanova352
    replied
    Originally posted by seb567 View Post
    Can you paste the 10 first lines of your file named SFBloom_paired_trimmed_1.fa ?
    Yeah, here it is:

    >DB775P1:2451TDYACXX:2:1101:1582:1958_1:N:0:CGTACTAG
    ATAATCGTTTGCTCGGCTATTTGAGTTGCAGATATTAATTGTTTACGACGGGATGTATCA
    AGTTCTGCAAAGACATCATCCAAAATTAGAATGGGTTC
    >DB775P1:2451TDYACXX:2:1101:1582:1958_2:N:0:CGTACTAG
    ATGACCTACATCTACAAATCGGAGATTTTCCGGCTAAAGGTTATGCCAGCCACGGTGAAT
    CCTGGTCAATGGCGATTTCACTACGCATTGGATCTTTTAAT
    >DB775P1:2451TDYACXX:2:1101:1853:1966_1:N:0:CGTACTAG
    TTCACCTAGAGAATGACCGGCAACAAAGTGGGGCGTAGGAAGTCGATCCTTGAAAGAGGT
    GAGGACCATCCAGGAGTGCATTAAAATAGCCGGCTGAG
    >DB775P1:2451TDYACXX:2:1101:1853:1966_2:N:0:CGTACTAG

    Leave a comment:


  • seb567
    replied
    Originally posted by bossanova352 View Post
    The output of NumberOfSequences.txt is:



    This is what I find in the stdoutput file a considerable way through. I think the issue lies in what I've quoted below, as there are seeds before this and everything goes to 0 afterwards. It looks like it's skipping all paths because of short length:



    I'm working with Illumina Hiseq paired end 100 bp reads which have been trimmed based on quality scores and length, so there should be nothing shorter than 50 bp. Oh, and my command is this:

    mpiexec -n 10 Ray -k 57 -i ../SFBloom_paired_trimmed_1.fa -o RayOutputTest
    Can you paste the 10 first lines of your file named SFBloom_paired_trimmed_1.fa ?

    Leave a comment:


  • bossanova352
    replied
    Originally posted by seb567 View Post
    What is the content of RayOutput/NumberOfSequences.txt ?

    Do you have any error in your standard output file ? ("grep Error log.stdout")
    The output of NumberOfSequences.txt is:

    Files: 1

    FileNumber: 0
    FilePath: ../SFBloom_paired_trimmed_1.fa
    NumberOfSequences: 7917760
    FirstSequence: 0
    LastSequence: 7917759

    Summary
    NumberOfSequences: 7917760
    FirstSequence: 0
    LastSequence: 7917759
    This is what I find in the stdoutput file a considerable way through. I think the issue lies in what I've quoted below, as there are seeds before this and everything goes to 0 afterwards. It looks like it's skipping all paths because of short length:

    Rank 8 has 0 seeds
    Rank 8 is creating seeds [147526/147526] (completed)
    Rank 8: peak number of workers: 1887, maximum: 32768
    Rank 8 : VirtualCommunicator (service provided by VirtualCommunicator): 494068 virtual messages generated 4040 real messages (0.817701%)
    Rank 8 runtime statistics for seeding algorithm:
    Rank 8 Skipped paths because of dead end for head: 0
    Rank 8 Skipped paths because of dead end for tail: 0
    Rank 8 Skipped paths because of two dead ends: 0
    Rank 8 Skipped paths because of bubble weak component: 0
    Rank 8 Skipped paths because of short length: 147526
    Rank 8 Skipped paths because of bad ownership: 0
    Rank 8 Skipped paths because of low coverage: 0
    Rank 8 Eligible paths: 0
    Rank 8: assembler memory usage: 139120 KiB
    I'm working with Illumina Hiseq paired end 100 bp reads which have been trimmed based on quality scores and length, so there should be nothing shorter than 50 bp. Oh, and my command is this:

    mpiexec -n 10 Ray -k 57 -i ../SFBloom_paired_trimmed_1.fa -o RayOutputTest
    Last edited by bossanova352; 01-13-2014, 12:03 PM.

    Leave a comment:


  • seb567
    replied
    Originally posted by VidJa View Post
    I'm trying to use Ray with a mix of Illumina SE and 454 data using the -amos
    mpiexec -n 10 Ray -s illumina.fasta -s 454.fasta -k 31 -amos -o mix
    454 reads have an avg length of 396bp and the illumina reads are 60bp

    The resulting amos file AMOS.afg can be browsed using Tablet, but I noticed that the original read names are converted to just a number and thus making it impossible to easily trace which reads end up in a particular contig. Is it possible to save the original readnames in the AMOS output?
    This is not currently possible because Ray does not read or store read names at all.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:49 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-20-2024, 07:23 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Working...
X