Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by genetics_jo View Post
    My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?
    I used to work on velvet several years back, and discussed the code/parameters in our blog many times in those days.

    e.g.

    Appropriate choice of the ‘exp_cov’ (expected coverage) parameter in Velvet is very important to get an assembly right. In the following figure, we show data from a calculation on a set of reads taken from a 3Kb region of a genome, and reassembling them with varying exp_cov parameters. X-axis in the chart shows the exp_cov and y-axis shows the size of the largest scaffold assembled by Velvet.


    If you used Velvet genome assembler, you possibly have noticed a file named ‘Roadmaps’ being created by the ‘velveth’ program. Here is a brief explanation of the format of ‘Roadmap’ file explained by Daniel Zerbino, the author of Velvet.


    About an year back, we briefly explained the format of Roadmaps file generated by Velvet assembly program. Our explanation was brief and was not very helpful to understand all entries in the Roadmaps file. Reader SRB requested us to provide more details by working on an example.


    The problem with Velvet (especially velvetg) is that it is not at all optimized for large genomes and the time to get to output can be unpredictable. Moreover, you can trust its contig step, but not its scaffolding. However, the contigs produced by Velvet can be easily done by SOAPdenovo2 or Minia in much less time. For example, with the hardware you are describing, SOAPdenovo2 will give you the output in hours, not days.

    I know this is not the answer you asked for and you already mentioned about using other assemblers.
    Last edited by samanta; 04-11-2014, 03:38 PM.
    http://homolog.us

    Comment


    • #17
      Originally posted by genetics_jo View Post
      One other question...I've seen some folks say the paired end fastq files need to be merged together into a single file for "shortPaired" use in Velvet...and seen some say that the two paired end files need to be kept separate and let velvet read and coordinate reads. Which one is it? For example if I have files Humulus_lane1_read1_1.fastq and Humulus_lane1_read1_2.fastq, should these two files be merged together or kept separately for velvet to work properly?
      In the version I worked on two years back for trying to assemble a large genome (~600MB size), the paired reads needed to be merged into one file.

      FASTA Line 1-2 (read1 left)
      FASTA Line 3-4 (read1 right)

      etc.

      The difficulty I faced was that the scaffolds were completely unpredictable based on small changes in input parameters (exp_cov). It is not as if you run everything once, press a button and trust the output.

      That led me to move on to other assemblers. Also, I make sure I understand the code/algorithm of any assembler I use.
      Last edited by samanta; 04-11-2014, 03:41 PM.
      http://homolog.us

      Comment


      • #18
        Originally posted by genetics_jo View Post
        That's also what I've observed with the previous runs of velvet. The program is still "running" but RAM use and % of processor have remained the same now for several days. Would have thought if it wasn't going to work it would have crashed?

        A large part of the work is in removing 'tips' and 'bubbles' and then simplifying the graph. That is when you do not see any input/output.
        http://homolog.us

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM
        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 01:32 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-24-2024, 07:15 AM
        0 responses
        199 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-23-2024, 10:28 AM
        0 responses
        221 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-23-2024, 07:35 AM
        0 responses
        232 views
        0 likes
        Last Post seqadmin  
        Working...
        X