Originally posted by Torst
View Post
Q. Is the reference still a draft (contigs/scaffolds) or is it closed?
Q. What does "not quite as good" mean?
####scaffs, total_size, longest, mean, scaf>0.1M, scaf>1M, N50
new1 1101, 51.8 Mb, 2.08 Mb, 47 kb, 101(9.2%), 16(1.5%), 0.6 Mb
new2 2172, 5.22 Mb, 2.08 Mb, 21 kb, 101(4.0%), 13 (0.5%), 0.6 Mb
oldref 1386, 50.9 Mb, 5.05 Mb, 36 kb, 64 (4.6%), 14(1%), 1.2 Mb
The "new1" assembly is the PE data run with VelvetOptimiser (optimal kmer size = 85). The new2 assembly is PE + MP data (optimal kmer size = 87)
I have also run all draft sets through the cegma pipeline as a benchmarking exercise, and they all give ~98-99% completeness for the KOG set of conserved protein seqs.
Upon inspection, it appears that, except for the longest scaf and N50, the new1 assembly is pretty good (relative to the oldref assembly). Adding the MP data clearly splits and increases scaffolds, and slightly reduces scafs > 1 Mb. So, I'm unsure if this a good or a bad result at this point, i.e. are they breaking up because the PE contigs were poorly scaffolded, or because the MP data is problematic.
I would also add that I have tried to run the data using these same kmer sizes with SOAPdenovo (see below).
Q. Are you de novo assembling your genome, or trying to improve the reference?

Some comments:
* the Velvet -long option might help, but look at the Columbus module instead, which is the reference-guided part of Velvet
* the Velvet -long option might help, but look at the Columbus module instead, which is the reference-guided part of Velvet
* the two genome assemblies are likely broken at the same places (mostly large/tricky repeats) so fixing one with the other won't gain much unless there were other coverage/bias problems.
* there are tools like GapFiller, IMAGE etc for improving assemblies via read alignment back
* you may wish to use SSPACE etc instead of Velvet for the mate pair scaffolding
* Illumina Mate Pair data can be very dodgy sometimes: http://thegenomefactory.blogspot.com...sequences.html
Have you tried SoapDenovo2 (just released)
I have run Soapdenovo with the same kmer size as in Velvet; however, if I include the MP lib. (using the exact same rev. comp. read files that Velvet is quite happy with) Soap throws a segmentation fault/core dump error, which perhaps is another reason to suspect that something isn't quite cricket with the MP lib? The Soap PE assembly is not nearly as good w.r.t. above metrics as either the PE or PE+MP Velvet assemblies. I am running Gapfiller to see if I can improve it, but I'm not optimistic...
Thanks for your recommendations, Toresten. Now that you have a few more details, maybe you have some thoughts regarding the effect the MP lib. has when included in the assembly.
Leave a comment: