Seqanswers Leaderboard Ad

**Thorondor** · 04-26-2011, 01:17 AM

if you don't want to compare paired end to single end you don't need to do "subsetting".

the last line just tells you your N50, how many reads were used (you can also request velvet to output a UnusedReads.fa file) etc, i am not sure if the number of nodes is the number of your contigs. You can check that by grep ">" -c contigs.fa.

**AdrianP** · 04-26-2011, 06:35 AM

Thank you for your previous reply.
Is there any way to see which contig is largest? Somehow sort contigs by their size?

**Thorondor** · 04-26-2011, 07:25 AM

sure there are a lot of ways. ;-) Use a script, you even have the (length + kmer -1) in the id of the contig so it is really easy.

here are some perl scripts that might help:

http://wiki.bioinformatics.ucdavis.edu/index.php/Data_Analysis

**AdrianP** · 04-27-2011, 06:34 PM

If Velvet assembles poor (small) contigs when other programs with same settings (coverage and insertsize) do much much better, what can be my conclusions?

By the way most of those scripts are for fasta, and i got fastq, is there a script that converts?

Adrian

**Thorondor** · 04-27-2011, 11:17 PM

http://brianknaus.com/software/srtoolbox/fastq2fasta.pl

first hit in google. ;-) Also normally trimming "programs" takes fastq as input and output a fasta.

velvet needs a good coverage to do well because it's de brujin based. Since I don't know on what data you run velvet I and what you expect there is no help. Try smaller kmers, try different parameters, try multiple kmer....

**AdrianP** · 04-28-2011, 02:25 AM

One thing that is puzzeling me is that Genegenious takes a few days to assemble the data that velvet assembles in 10-15 mins. Is it normal that velvet runs so fast?

**Thorondor** · 04-28-2011, 02:58 AM

i have no clue what genegenious is and on what algorithm it is based. So if it is a ovelap-based method, yes it is possible and depends on kmer, amount of reads you have, expected coverage, read length .....

**AdrianP** · 05-03-2011, 10:14 AM

I was wondering a bit more about Velvet's last line output.

What is n50? (I understand that it is a measurement of quality, the higher the better???)
What is max?
What is total?
What are nodes? (this isn't as important since i believe it is related to the graph that velvet builds, and I do not use the graph, just the final contigs.fa file. Should I use the graph?)

Also, what is the diffrence between shortpaired and shortpaired2 ? Something to do with inert libraries...

Thanks a lot!

**Thorondor** · 05-04-2011, 12:47 AM

come on, do a bit more research on your own. :P

read the velvet paper:

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

http://genome.cshlp.org/content/18/5/821.short

An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

You don't want to use the graph and if you know what a graph is you should also know what nodes are. Nodes are the vertices of a graph.
shortpaired2 ist the same as shortpaired but for a separate insert size library (also stated in the manual).

**AdrianP** · 05-04-2011, 09:33 AM

The Manual I have read it a few times but I guess what I was asking is what does that mean "separate insert size library" ? Separate from what?

As for the research paper, i had a look at it before, I can't find a defenition for n50, it jumps straightly to using that term, eve in the abstract. I would not have asked if I did not do the research myself.

I found the answer here after googling n50 http://seqanswers.com/forums/showthread.php?t=2332

**Thorondor** · 05-04-2011, 10:19 AM

yes, it is the first hit when you google "definition: N50"

separate to "shortpaired" I assume. So you can use 2 different PE libraries with different insert sizes, but maybe I am wrong.

Total is their calculated base pairs total, but keep in mind that the bp length of eacht transcript is the "real" bp length minus kmer plus 1, as mentioned somewhere.^^

max might be the longest contig, but I don't remember it exactly, since you need to do more statistics anyway. ;-)

**tonybolger** · 05-05-2011, 01:48 AM

Originally posted by AdrianP View Post

If Velvet assembles poor (small) contigs when other programs with same settings (coverage and insertsize) do much much better, what can be my conclusions?

Could be many things: appropriate vs inappropriate settings, more/less sensitive to data error vs coverage, or simply wrong/right tool for this particular job.

You also could have one tool making a decent size but completely incorrect assembly, with another making a cautious but correct assembly. N50/size isn't everything. Can you validate your results somehow, e.g. against another closely-related known genome?

Either way, getting the best out of a dataset may require months of trial and error, tweaking etc. Even getting a particular tool do run properly and produce decent output can take weeks and can make a massive difference - the DBG assemblers all seem to have glass jaws. Pre-filtering the data seems to make a massive difference to most though.

Incidentally, I don't have a lot of experience with velvet in particular, since it's simply too heavy for my project (>1GBase genome)

**AdrianP** · 05-05-2011, 03:31 AM

Yeah actually my next genome to work with is a mitGenome and is about 70k, pretty cool.

I will start working with consed, not an easy program to work with but as I understand incredibly useful.

**seb567** · 05-31-2011, 10:16 AM

Originally posted by AdrianP View Post

Greetings to you all,

After making sure I know how to use velvet I will also try Ray and SOAP.

Hi !

I am the author of Ray so if you have any question, ask away.

Basically, with Ray, you will need to convert your two files to fasta or fastq format.

There is a script in maq for that.

Maq

http://maq.sourceforge.net/

maq-0.7.1/scripts/fq_all2std.pl export2std 100611_s_4_1_seq_GDR-7.txt > 100611_s_4_1_seq_GDR-7.txt.fastq
maq-0.7.1/scripts/fq_all2std.pl export2std 100611_s_4_2_seq_GDR-7.txt > 100611_s_4_2_seq_GDR-7.txt.fastq

Ray is available at http://denovoassembler.sf.net

Then, using Ray, you assemble these reads:

mpirun -np 8 Ray -k 31 -p 100611_s_4_1_seq_GDR-7.txt.fastq 100611_s_4_2_seq_GR-7.txt.fastq -o Ray-test-1.4.0

I encourage you to explore the files written by Ray:

ls Ray-test-1.4.0.*

see http://denovoassembler.sourceforge.n...00000000000000

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Velvet & paired-ends

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News