Seqanswers Leaderboard Ad

**samanta** · 04-11-2014, 03:33 PM

Originally posted by genetics_jo View Post

My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?

I used to work on velvet several years back, and discussed the code/parameters in our blog many times in those days.

e.g.

An Explanation of Velvet Parameter exp_cov

http://www.homolog.us/blogs/blog/2012/06/08/an-explanation-of-velvet-parameter-exp_cov/

Appropriate choice of the ‘exp_cov’ (expected coverage) parameter in Velvet is very important to get an assembly right. In the following figure, we show data from a calculation on a set of reads taken from a 3Kb region of a genome, and reassembling them with varying exp_cov parameters. X-axis in the chart shows the exp_cov and y-axis shows the size of the largest scaffold assembled by Velvet.

Format of Velvet Output File 'Roadmaps'

http://www.homolog.us/blogs/blog/2011/12/06/format-of-velvet-roadmap-file/

If you used Velvet genome assembler, you possibly have noticed a file named ‘Roadmaps’ being created by the ‘velveth’ program. Here is a brief explanation of the format of ‘Roadmap’ file explained by Daniel Zerbino, the author of Velvet.

More Details on Velvet Roadmaps File Based on Reader's Question

http://www.homolog.us/blogs/blog/2012/12/27/more-details-on-velvet-roadmaps-file-based-on-readers-question/

About an year back, we briefly explained the format of Roadmaps file generated by Velvet assembly program. Our explanation was brief and was not very helpful to understand all entries in the Roadmaps file. Reader SRB requested us to provide more details by working on an example.

The problem with Velvet (especially velvetg) is that it is not at all optimized for large genomes and the time to get to output can be unpredictable. Moreover, you can trust its contig step, but not its scaffolding. However, the contigs produced by Velvet can be easily done by SOAPdenovo2 or Minia in much less time. For example, with the hardware you are describing, SOAPdenovo2 will give you the output in hours, not days.

I know this is not the answer you asked for and you already mentioned about using other assemblers.

**samanta** · 04-11-2014, 03:35 PM

Originally posted by genetics_jo View Post

One other question...I've seen some folks say the paired end fastq files need to be merged together into a single file for "shortPaired" use in Velvet...and seen some say that the two paired end files need to be kept separate and let velvet read and coordinate reads. Which one is it? For example if I have files Humulus_lane1_read1_1.fastq and Humulus_lane1_read1_2.fastq, should these two files be merged together or kept separately for velvet to work properly?

In the version I worked on two years back for trying to assemble a large genome (~600MB size), the paired reads needed to be merged into one file.

FASTA Line 1-2 (read1 left)
FASTA Line 3-4 (read1 right)

etc.

The difficulty I faced was that the scaffolds were completely unpredictable based on small changes in input parameters (exp_cov). It is not as if you run everything once, press a button and trust the output.

That led me to move on to other assemblers. Also, I make sure I understand the code/algorithm of any assembler I use.

**samanta** · 04-11-2014, 04:30 PM

Originally posted by genetics_jo View Post

That's also what I've observed with the previous runs of velvet. The program is still "running" but RAM use and % of processor have remained the same now for several days. Would have thought if it wasn't going to work it would have crashed?

A large part of the work is in removing 'tips' and 'bubbles' and then simplifying the graph. That is when you do not see any input/output.

Topics	Statistics	Last Post
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, Yesterday, 01:32 PM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 199 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM
Catalog of Gene-Isoform Variation in Developing Human Brain by seqadmin Started by seqadmin, 05-23-2024, 10:28 AM	0 responses 221 views 0 likes	Last Post by seqadmin 05-23-2024, 10:28 AM
Ancient Viral Sequences in Human Brain Linked to Psychiatric Disorders by seqadmin Started by seqadmin, 05-23-2024, 07:35 AM	0 responses 232 views 0 likes	Last Post by seqadmin 05-23-2024, 07:35 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News