Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running Velvet on a cluster

    Hi, I was wondering if someone might be able to help.

    I'm running Velvet on a cluster, specifically a Cray. I compile with Categories=6, OpenMPI set to 1 and Maxkmers set to 99.

    When I run Velveth with my 6 libraries in my local directory, like so:

    aprun -n 1 -q -N 1 -d $OMP_NUM_THREADS velveth Baudin_velvetKmer51/ 51 -fastq -shortPaired -separate Readset1.fastq Readset2.fastq -fastq -short singleReadset.fastq.........(etc for each of the remaining 5 libraries...shortpaired2, 3, 4, 5, 6) I get no Roadmap from the output. Velveth finishes at loading the sequences only.

    The aprun command above is used to assign the job on the cluster with a number of nodes, threads etc.

    I've managed to get it running with a single library, but for some reason it doesn't work for multiple libraries.

    Has anyone had similar problems running Velvet on a cluster and getting this kind of problem?

    Thanks for any help you can provide.

  • #2
    Other information:
    There are 64Gb per node and 16 threads per node.

    It might be a memory overhead problem with the 6 PE libraries, but there were no errors reported.

    Although that might be because Velvet may not report memory limitation errors, I'm not sure.

    Comment


    • #3
      Does velvet successfully get through all your sequences?

      It seems to me that you have more than 6 categories, if you are running your first library as both -shortPaired and -short, that is two categories.

      What is the total size of all the files with your reads?

      Comment


      • #4
        Thanks for replying.

        With some runs it gets through only some of the files.

        There are 429Gb in total of trimmed reads.

        It may be with 64Gb RAM I dont have enough memory.

        Each PE file is around 20-30Gb in size and each singleton file is around 3-5Gb in size.

        I'll re-compile with 3 categories, run with 1 library and see what happens, then re-compile with 6 categories and run with 2 libraries and see what happens.

        Comment


        • #5
          I have recently been running a dataset that is just less than 70Gb,
          and it is using about 55 Gb of memory, so yes, memory will be a problem
          if your files are so large.

          But you also need to compile velvet with the appropriate number of categories that you plan to use. So if you compile velvet with CATEGORIES=6, and then you actually use 12 categories, you will probably have problems wether you run out of memory or not.

          What velvet output are you getting when it stops?

          Comment


          • #6
            It looks like it is a memory limit.

            I'm running all 6 libraries separate with Velvet. Some are still running. Other libraries have already failed at Velveth, as the PE library is around 80Gb in size. Some are 40Gb, and they're still running.

            The output I get is just the Velveth output. It gets up to a certain number of reads loading into Sequences, and then dies. Roadmaps is never generated, so when Velvetg comes along, it complains no Roadmaps are available.

            Would you know of an alternative assembler which is more memory efficient, and perhaps does swapping of files onto disk instead of loading into memory?

            I've heard of Mira, but I'm not sure if its improved its efficiency. Theres also Soapdenovo, but I think that might have even more memory overhead issues.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-25-2024, 11:49 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            62 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Working...
            X