Hi i did velvet denovo assembly for a species which does not have its previous genome information , so how can I validate my assembly?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
That's the 64 million dollar question.
You'll have to talk to the biologists in your collaboration to see what is know that you can check (e.g. any ESTs or other previously sequenced bits like important genes, experimentally charactered genome size, GC percentage), and what related organisms you might be able to compare it to.
Specific to velvet, I'm sure there is plenty of good advice in the documentation and mailing list archive about common pitfalls etc.
-
I agree with Peter, this is the question we'd all love to be able to answer with confidence!
This is a question of having multiple pieces of evidence to give you a confidence level as to your assembly. With current technology it is still impossible to "prove" an assembly is correct, but you can get pretty damn close.
Optical mapping is a complementary technology which might be helpful for indepedent verification of contig order (particularly large contigs >100kb).
Sequencing with another technology, particularly 454 might give some clues as to the extent of misassemblies. Paired-end 454 data will be even more helpful.
You could do de novo assembly with other assemblers and see if they agree, but this is probably weak/ circumstantial evidence.
Another method of verifying an assembly is to design primers to amplify the entire genome in overlapping segments, say 10kb and check them on a gel. This of course relies on you having a finished genome sequence to check with.
You might find an easier question to answer is "what level of assembly accuracy will permit me to answer my scientific question?"
Comment
-
You [the OP] might be oversimplifying a tad.
Any assembly will be composed of:- correct contigs
- fragmented contigs
- chimeric contigs
- spurious contigs
and may suffer from:- missing contigs
I would consider chimeras and spurious contigs to be distinguished by length - spurious contigs are an artifact of the debruijn method and are very short. I don't think chimeras are very common in Velvet compared to other assemblers - any ambiguity normally results in fragments.
Velvet assemblies performed under high stringency (high kmer, high cvCut) conditions will minimize chimeric, fragmented and spurious contigs at the expense of more missing contigs.
To validate a de-novo short read assembly, especially a transcriptome which by its very nature will never form long contigs, you need to decide whether you are willing to accept some bad with the good or insist on just the good and get less of it. This is a classic signal-to-noise problem.
One way to judge an assembly is to run Velvet under varying parameters and see if the results converge. If you get wildly different results you can examine which contigs are spliced or fragmented under different settings and make your own judgments from there.
Comment
-
Good answer Zigster!
I'd add the final possibility of "correct" contigs containing consensus errors due to transposed nucleotides in repeats which have been resolved using paired-end information, as discussed in my blog post at http://pathogenomics.bham.ac.uk/blog...nome-assembly/
Comment
-
hi zingster and nicklomen .. thanx for your replies ..it was very use full .. My idea to validate is, if we have the sanger sequences of the species what we are assembling then we can do a blast against the assembled contigs of solexa and then we can take the assembly which has the maximum sanger sequences covered in the blast (for eg more than 90 percent) as a valied assembly ..what do you think?
Comment
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
01-01-2025, 07:30 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 01-09-2025, 04:04 PM
|
0 responses
443 views
0 likes
|
Last Post
by seqadmin
01-09-2025, 04:04 PM
|
||
Started by seqadmin, 01-09-2025, 09:42 AM
|
0 responses
445 views
0 likes
|
Last Post
by seqadmin
01-09-2025, 09:42 AM
|
||
Started by seqadmin, 01-08-2025, 03:17 PM
|
0 responses
460 views
0 likes
|
Last Post
by seqadmin
01-08-2025, 03:17 PM
|
||
Started by seqadmin, 01-03-2025, 11:18 AM
|
1 response
50 views
1 like
|
Last Post
by Tonia
01-05-2025, 12:15 PM
|
Comment