Header Leaderboard Ad

Collapse

Can't get Ray working

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can't get Ray working

    I recently got access to a very powerful machine for the purpose of de novo sequencing.

    I am using Ray-v2.0.0-rc8

    Unfortunately, I have been unable to get ray working on even the smallest test cases with 3 different compilers (Intel, GCC, some other compiler that comes with the system).

    Most of my runs end like this:
    Rank 49: assembler memory usage: 3813964 KiB
    Rank 75 reached 400 vertices from seed 91, flow 1
    Speed RAY_SLAVE_MODE_EXTENSION 2141 units/second
    Rank 75: assembler memory usage: 3816056 KiB

    Stack walkback for Rank 0 starting:
    [email protected]:113
    [email protected]
    [email protected]
    Machine::start()@0x40746f
    ComputeCore::runVanilla()@0x500957
    MessageProcessor::call_RAY_MPI_TAG_ASK_IS_ASSEMBLED(Message*)@0x45ec8e
    Vertex::isAssembled()@0x4db3a0
    Stack walkback for Rank 0 done
    Process died with signal 11: 'Segmentation fault'
    Forcing core dumps of ranks 0, 5, 36, 64, 8, 29, 56, 10, 12, 109, 13, 52, 66, 110
    View application merged backtrace tree file with: statview atpMergedBT.dot
    _pmiu_daemon(SIGCHLD): [NID 00736] [c1-0c0s0n2] [Sun Jun 24 06:03:13 2012] PE RANK 1 exit signal Killed
    _pmiu_daemon(SIGCHLD): [NID 00767] [c1-0c0s0n1] [Sun Jun 24 06:03:13 2012] PE RANK 98 exit signal Killed
    [NID 00736] 2012-06-24 06:03:13 Apid 6339046: initiated application termination
    Application 6339046 exit codes: 137
    Application 6339046 exit signals: Killed
    Application 6339046 resources: utime ~39431s, stime ~74s
    With input scripts such as
    num="124"
    aprun -n $124 ./fancierRayDEBUG/Ray -o Assembly$num -k 31 \
    -p \
    Sample$num/ERR011117_1.fastq.gz \
    Sample$num/ERR011117_2.fastq.gz \
    -p \
    Sample$num/ERR011118_1.fastq.gz \
    Sample$num/ERR011118_2.fastq.gz \
    -p \
    Sample$num/ERR011119_1.fastq.gz \
    Sample$num/ERR011119_2.fastq.gz \
    -p \
    Sample$num/ERR011120_1.fastq.gz \
    Sample$num/ERR011120_2.fastq.gz \
    -p \
    Sample$num/ERR011121_1.fastq.gz \
    Sample$num/ERR011121_2.fastq.gz \
    -p \
    Sample$num/ERR011122_1.fastq.gz \
    Sample$num/ERR011122_2.fastq.gz \
    -p \
    Sample$num/ERR011123_1.fastq.gz \
    Sample$num/ERR011123_2.fastq.gz >& myOutput$num.out
    Similar output when trying smaller ecoli file with an intimidating number of PEs. I tried a similar simulation with only 64 PEs but the failure was the same.
    Rank 1: assembler memory usage: 3331872 KiB
    Rank 1091: assembler memory usage: 3330848 KiB
    Rank 1135: assembler memory usage: 3330848 KiB
    Application 6339025 resources: utime ~68335618s, stime ~49747s
    (gave neither error nor yield, was build without debug symbols)
    aprun -n4096 -N16 -d2 ./fancierRay/Ray --show-memory-usage -o secoliAssembly$num -k 23 \
    -p secoliSample$num\SRR001665_1.fastq.gz \
    secoliSample$num\SRR001665_2.fastq.gz \
    -p secoliSample$num\SRR001666_1.fastq.gz \
    secoliSampel$num\SRR001666_2.fastq.gz >& secolimyOutput$num.out
    1. I am trying to figure out how much ram I need per PE
    2. Does anybody have a minimal input output example
    3. I haven't done much de-novo assembly and was wondering if there are better programs for eukaryote genome assembly
    4. Has anybody tried SRR034, sequences SRR034939-34975?
    Last edited by lednakashim; 06-25-2012, 10:50 AM. Reason: aesthetics

  • #2
    Is the -n 124 argument the number of MPI processes (ranks)? If so, does your machine have 124 cores? The advice I was given was to match ranks to cores.

    If I didn't have my whole cluster smoking on Ray jobs, I'd run your test dataset :-) For ~10Mb bacterial genomes, on a cluster with 32Gb of RAM per cluster I am able to assemble 1Gb of MiSeq data with k=31 (advisable for Illumina data) on even just 8 cores of a cluster.

    Comment


    • #3
      Hello,


      Originally posted by lednakashim View Post
      I recently got access to a very powerful machine for the purpose of de novo sequencing.

      I am using Ray-v2.0.0-rc8

      Unfortunately, I have been unable to get ray working on even the smallest test cases with 3 different compilers (Intel, GCC, some other compiler that comes with the system).

      Changing the compiler will not change much.

      Originally posted by lednakashim View Post

      Most of my runs end like this:

      With input scripts such as

      I don't know much about aprun. Is it like mpiexec or mpirun, but for
      a given super computer ?

      What is aprun and what is $124 ?

      What is the meaning of this command "aprun -n4096 -N16 -d2" ?


      I saw previously segmentation faults due to message corruption caused by
      QLogic Performance Scaled Messaging (PSM) from Intel, Inc.

      Maybe this is a similar issue caused by the middleware.

      Originally posted by lednakashim View Post


      Similar output when trying smaller ecoli file with an intimidating number of PEs. I tried a similar simulation with only 64 PEs but the failure was the same.



      (gave neither error nor yield, was build without debug symbols)


      1. I am trying to figure out how much ram I need per PE
      How much memory do you have ?

      Is your system running out of memory ?

      Originally posted by lednakashim View Post

      2. Does anybody have a minimal input output example
      Try this sample: (it is E. coli)

      ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...65_1.fastq.bz2
      ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...65_2.fastq.bz2
      ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...66_1.fastq.bz2
      ftp://ftp.ddbj.nig.ac.jp/ddbj_databa...66_2.fastq.bz2


      You can give these files directly to Ray if compiled with HAVE_LIBBZ2=y


      Originally posted by lednakashim View Post

      3. I haven't done much de-novo assembly and was wondering if there are better programs for eukaryote genome assembly

      There is a list on Wikipedia.


      Originally posted by lednakashim View Post


      4. Has anybody tried SRR034, sequences SRR034939-34975?
      I have not. Is anything special about it ?

      S├ębastien

      Comment


      • #4
        Hi,

        According to this page, aprun is the application launcher in
        the Cray Linux Environment (CLE).


        The option to specify the number of processor cores is -n (like in mpiexec).

        In your first command, you used aprun -n $124.

        In most shells, $1 is the first argument given to a shell program. Therefore, $124 will resolve to 24.

        Example:

        [email protected]:~/odin1/cloud$ echo $124
        24


        aprun -n 124 will run your job on 124 processing cores on your system.
        Running only on 24 cores will make these 24 cores consume a lot of memory.
        From your log, it is 3.8 GB per core.


        In your second command, you used aprun -n4096 -N16 -d2.

        -d2 has no sense for Ray because it specifies the number of processor cores for each processing element. This should be 1 in Ray (the default in aprun).

        -N16 means the number of processing elements per node. I am pretty sure you should not touch that. The scheduler (possibly from Cray, Inc.) must be able to figure out that by itself.

        -n4096 means a lot of processing elements for just a small bacterial genome.
        And for that amount of processing cores, you will likely need to enable message routing in Ray.


        Likely this job crashed outside of Ray because of a lack of ressource.


        I hope my comments will be helpful for you.

        First, you should test your system with Ray using the bacterial genome you already downloaded (SRA001125 - E. coli) with something like 2 or 3 nodes.




        S├ębastien

        Comment


        • #5
          Wow, thanks for the prompt reply!

          The $124 is a typo that was made while I was posting. The post should say $num. Sorry about that :-)

          If N16 is not selected the scheduler will choose the default, in our case N32. Choosing N16 doubles the memory available. Choosing N16 with d2 doubles the memory available for each process at the expense of CPU board utilization. There is a CSC article commenting on this at http://www.csc.fi/english/pages/louh...commands/aprun . Additionally, many of the instructions for using aprun that can be found on the web are specific to the systems that host the instructions. My understanding is that the defaults vary among deployed systems.

          I'm going to rerun the assemblies with 16 cores. I have tried toggling the enable message routing flag, but I get similar failures. I will follow up this post with those results when the computer I use becomes available.

          I am trying to understanding what kind of debug output would you find useful? Perhaps core dumps? I have inconstant failures for the same kind of setup; the same program will fail with different errors at different stages.
          Last edited by lednakashim; 06-25-2012, 03:09 PM.

          Comment


          • #6
            Hello everyone...
            I am using ray for assembling HiSeq2000 109 million PE reads. I have given 12 cores for ranks with 144 gb RAM. Can anyone tel me the estimated time it will take and the steps through which it goes.
            At present ray is calculating the vertices.
            Any help will be highly appreciated.

            Comment


            • #7
              Not without additional information!

              1. What kind of sample (RNA, Bacteria?, higher eukaryotes?)
              2. Your network, are you using an IBM cluster, a Cray cluster? Did you link together a bunch of Dells?

              Comment


              • #8
                The required time will depend also on the size of what you are assembling.

                And what important thing if you use interconnected computers is to have a low latency. You can check this in the file NetworkTest.txt, which was written in your Ray output directory.

                Originally posted by waterboy View Post
                Hello everyone...
                I am using ray for assembling HiSeq2000 109 million PE reads. I have given 12 cores for ranks with 144 gb RAM. Can anyone tel me the estimated time it will take and the steps through which it goes.
                At present ray is calculating the vertices.
                Any help will be highly appreciated.

                Comment

                Working...
                X