Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by [email protected] View Post
    I tried using the velvet formula for my RAM calculation and about 500 Gb is the answer for human sized genomes. Is that your experience too? Aslo, how many days approx. would it take? The HPC environment here has a max walltime of 96hrs.

    Another question is regarding trimming, did you use any specific software or wrote your own scripts?
    Firstly, I did not use human sized genome. I was doing metagenomics assembly of three known bacterial genome. Secondly, my computing time varied as I was not only the one using the HPC. If you are asking about the quality trimming of sequence reads (illumina or 454 pyrosequence), I see some people suggest seqtrim, prinseq and clean_reads for trimming. I did not do any quality trimming during my comparative study. I have also developed my own script for quality trimming. I compared my script with prinseq and clean_reads which is performing better than them on the overall. Sorry I can give the script now as I am going to publish it. After the publication I will notify soon.

    Comment


    • #32
      Originally posted by Himalaya View Post
      Firstly, I did not use human sized genome. I was doing metagenomics assembly of three known bacterial genome. Secondly, my computing time varied as I was not only the one using the HPC. If you are asking about the quality trimming of sequence reads (illumina or 454 pyrosequence), I see some people suggest seqtrim, prinseq and clean_reads for trimming. I did not do any quality trimming during my comparative study. I have also developed my own script for quality trimming. I compared my script with prinseq and clean_reads which is performing better than them on the overall. Sorry I can give the script now as I am going to publish it. After the publication I will notify soon.
      Thank you Himalaya!

      Comment


      • #33
        Originally posted by [email protected] View Post
        I tried using the velvet formula for my RAM calculation and about 500 Gb is the answer for human sized genomes. Is that your experience too?
        Memory consumption is massively impacted by read quality - trimming is strongly recommended to reduce the pain.

        Originally posted by [email protected] View Post
        Aslo, how many days approx. would it take? The HPC environment here has a max walltime of 96hrs.
        Depends a lot on the hardware - for me, SOAP takes about 2 days, but it can be split into separate steps. I don't have experience of velvet with full-scale genomes, but i suspect you'll go well over that time.

        Originally posted by [email protected] View Post
        Another question is regarding trimming, did you use any specific software or wrote your own scripts?
        I wrote this bad boy to do what i needed.

        Comment


        • #34
          Originally posted by acopeland View Post
          I won't make any claim about allpaths-lg being universally runnable, but I can say that we have been using it quite successfully on microbial and fungal projects.

          To get workable binaries you will need gcc-4.3.2 or newer and boost-1.38 or newer (building these can be a bear). I can post configure commands if that's useful to anyone.

          Finally, I strongly encourage anyone testing allpaths-lg to download the test data supplied by the Broad (ftp://ftp.broadinstitute.org/pub/crd....genome.tar.gz) and get this working before running your own data.
          I also have this question, and my gcc is gcc-4.5.2. when I run allpaths-lg using the test data;
          /bin/sh: PrepareAllPathsInput: not found
          make:********* Error 127
          can you help me? I want to run this software.
          thans
          Last edited by erhuangzi; 04-08-2012, 05:40 PM. Reason: I got it

          Comment


          • #35
            Originally posted by francesco.vezzi View Post
            Hi
            at the end it seems that nobody is able to run ALLPATHS is that true?

            F.
            I'm running it with these options:
            RunAllPathsLG PRE=<my dir. REFERENCE_NAME=refs DATA_SUBDIR=data RUN=output OVERWRITE=TRUE USE_LONG_JUMPS=False REFERENCE_FASTA=refs/reference.fasta

            And I get this error:




            Error: file /scratch/hpc/raquel/allpaths/refs/data/frag_reads_orig.fastb is supposed to already exist, but doesn't.
            ForceAssert(IsRegularFile( *it )) at system/MiscUtil.cc:996 failed in function
            int MakeMgr::RunMake(int)


            Mon Apr 29 11:24:19 2013. Abort. Stopping.

            Generating a backtrace...

            Dump of stack:

            0. CRD::exit(int), in Exit.cc:49
            1. yes, in Assert.h:52
            2. main, in RunAllPathsLG.cc:3134

            Comment


            • #36
              allpaths is exiting in the que showing &quot;E&quot; in log file no error messages also

              my pbs script
              #!/bin/bash
              #PBS -l walltime=48:00:00
              #PBS -N 268_allpaths
              #PBS -q workq
              #PBS -l select=40:ncpus=16:mpiprocs=16
              #PBS -l place=scatter:excl
              #PBS -V

              # comment begins with # followed by space......[IMPORTANT]
              # Go to the directory from which you submitted the job
              # cd $PBS_O_WORKDIR

              module load all_paths-2.2
              # path of all_paths
              # path : /app/allpathslg
              module load openmpi-1.6.4


              # ulimit (stack) is needed by the allpaths program
              ulimit -s 100000

              # prepare data for allpaths:
              PrepareAllPathsInput\
              DATA_DIR=$PWD/scratch/268_allpaths\
              PLOIDY=1\
              IN_GROUPS_CSV=/scratch/268_allpaths/in_groups.csv\
              IN_LIBS_CSV=/scratch/268_allpaths/in_libs.csv\
              OVERWRITE=True\
              | tee prepare.out

              # Assemble data:
              allpathslg\
              PRE=$PWD\
              DATA_SUBDIR=data\
              RUN=run\
              SUBDIR=test\
              OVERWRITE=True\
              | tee -a assemble.out


              my csv files

              in_groups.csv

              file_name, library_name, group_name
              /scratch/268_allpaths/SO_2511_268_R1.fastq, illumina, frags
              /scratch/268_allpaths/SO_2511_268_R2.fastq, illumina, frags
              /scratch/268_allpaths/SO_2511_268_R1.fastq.gz, illumina_short, jumping
              /scratch/268_allpaths/SO_2511_268_R2.fastq.gz, illumina_short, jumping

              in_libs.csv
              library_name, project_name, organism_name, type, paired, frag_size, insert_size, read_orientation, genomic_start, genomic_end
              illumina, test assembly, test, fragment, 1, 300bp, 480bp, inward, 0, 0
              illumina, test assembly, test, fragment, 1, 300bp, 480bp, inward, 0, 0
              illumina, test assembly, test, jumping, 1, , 2k, outward, 0, 0
              illumina, test assembly, test, jumping, 1, , 2k, outward, 0, 0

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X