Seqanswers Leaderboard Ad

**Brian Bushnell** · 02-16-2015, 10:34 AM

I do recommend Quast; it's easy to use and provides some useful statistics on assembly continuity, but without a reference it is not really a complete solution, and is mainly useful for continuity statistics (N50/L50).

We typically use BBMap to evaluate the quality of assemblies that lack a reference, as it provides a nice summary of error statistics. If you map the reads to each assembly, it will tell you:

%of reads mapped (higher is better)
%of reads properly paired (higher is better)
%of reads that mapped ambiguously (lower is better)
%of reads that matched the reference perfectly (higher is better)

...and also the overall error rate, and rates of each individual error type (substitutions, insertions, and deletions) on a per-base and per-read level. In each case, of course, lower is better. You can also use it to directly output per-contig coverage stats (with the covstats=file flag), which is sometimes useful for spotting collapsed repeats or contaminant contigs.

**FrancescaRaffini** · 02-16-2015, 11:20 PM

Thank you very much Brian Bushnell. I would kindly ask you if your package was already used with ddRAD.

**Brian Bushnell** · 02-16-2015, 11:40 PM

Originally posted by FrancescaRaffini View Post

Thank you very much Brian Bushnell. I would kindly ask you if your package was already used with ddRAD.

That's possible, but I don't know. I have never worked with ddRAD data, and I am unaware of it being used at JGI.

**sarvidsson** · 02-17-2015, 12:18 AM

I don't think Quast can be used for ddRAD data. What did you use to "assemble" your data - Stacks? I would evaluate the data based on number of polymorphic (where >2 samples are homozygote for each allele) "stacks"/loci covered by at least 10x coverage in 2/3 of your samples. With a few thousand such loci, you have a pretty nice dataset. If not, there is a long list of things than can go wrong (especially in the wet lab)...

@ BB: Typical ddRAD protocols produce quite small fragments (100-250 bp) representing small islands with strictly defined borders (restriction enzyme digested), so you rarely get any contigs larger than possible read pair overlap when pushing the data through a de novo assembler.

**Brian Bushnell** · 02-17-2015, 12:33 AM

Thanks for the explanation. It does not sound like standard methods of assembly and assembly evaluation are relevant here.

**FrancescaRaffini** · 02-17-2015, 07:02 AM

Thank you to both for yuor suggestions.

Sarvidsson, yes, I am using Stacks. Since I will perform diverse denovo assembly using diverse parameter, I need some qualitative measures (e.g. error, I can't measure it using most known methods since I don't have a refernce genome, replicates or generations) to estimate how good is the assembly with that paramenters. Your method looks interesting, but I am afraid it is not enough alone to evaluate the assembly quality.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

denovo assembly evaluation

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News