Seqanswers Leaderboard Ad

**Brian Bushnell** · 06-27-2017, 11:22 AM

The peak-calling and ploidy estimation in KmerCountExact are not very sophisticated, but I certainly expect them to do a better job than that! To my eye this is a very obvious diploid; not sure why the ploidy sometimes is estimated at 1; I'll have to look into that. The X*2 and X*3 peaks are 2-copy and 3-copy repeats, which are very pronounced; looks like the organism is fairly repetitive - they are normally much smaller.

It does look to me from the picture that the het rate is pretty low, though (assuming that X=63 is indeed the haploid genomic peak).

So, as for your questions...

1) None of them are exactly right, but certainly all the kmer lengths that predict ploidy of 1 are wrong! I'd use the estimates from K=31. The model for calculating genome size and het rate assumes all SNPs are at least K bases from each other. As you increase K (or increase the het rate) that starts to become less correct, which is why you can't use a super-long K usefully for these calculations, but K=31 is still pretty short. Also, as you increase K, semi-repetitive sequences are better resolved as being unique (which is why the estimated genome size increases with K), so there's a tradeoff between high and low K.

2) Answered.

3) The het rate is based on the assumption that all heterozygous events are SNPs or very short indels. If the polymorphism is more along the lines of rearrangements, the estimate would be pretty incorrect...

**svitlana** · 07-03-2017, 03:27 AM

Thank you Brian for this complete and very useful answer! So, according to these estimations, the main complexity in assembly of this genome will be its repetitive content rather than its degree of polymorphism. It's a very useful information.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 19 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Genome size estimation with BBTools: what is the right k value?

Comment

Comment

Latest Articles

ad_right_rmr

News