Seqanswers Leaderboard Ad

**HESmith** · 12-03-2015, 02:24 PM

I don't know if the polyploid assemblers will work with your sample. Ignoring repetitive elements (which are difficult enough in haploids), polyploid genomes have (approximately) integer multiples of the haploid genome for variant representation. The data corresponding to one haploid genome, as well as the ploidy, can be estimated from kmer frequencies. For example, a tetraploid genome will typically have the tallest kmer peak at 4X the frequency of the first observable peak (the former represents the non-variant sequences, while the latter represents variants unique to one haploid genome), with peaks at 2X and 3X as well. Your sample has an unknown number of strains with non-integer representations, and you expect the majority of their genomes to be identical. Your predicted distribution of kmer frequencies would be one large peak corresponding to your depth of coverage, with no obvious peaks at lower coverage.

Perhaps the best approach would be to align the data to an E. coli reference with stringent parameters (e.g., minimal mismatches and gaps) to remove the bulk of the data, then try to assemble the unaligned reads with a metagenome assembler. Note that I have not attempted such an approach, so it may not work.

**maxsalm** · 12-04-2015, 06:02 AM

Hi there! You could try the CORTEX assembler: http://www.ncbi.nlm.nih.gov/pubmed/23172865

**Brian Bushnell** · 12-04-2015, 10:29 AM

The best approach would probably be to sequence and assemble with very long reads (PacBio) in order to separate the strains. Short reads are not very good for phasing.

**rhinoceros** · 12-07-2015, 06:26 AM

Give some metagenome assembler like IDBA-UD a try. It takes into account read coverage. Chances are high that your strains were not present in the isolated DNA in equal abundance. Also set very strict threshold for similarity, e.g. 99%.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Genome assembly of mixed E. coli samples

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News