Seqanswers Leaderboard Ad

**HESmith** · 12-03-2015, 02:24 PM

I don't know if the polyploid assemblers will work with your sample. Ignoring repetitive elements (which are difficult enough in haploids), polyploid genomes have (approximately) integer multiples of the haploid genome for variant representation. The data corresponding to one haploid genome, as well as the ploidy, can be estimated from kmer frequencies. For example, a tetraploid genome will typically have the tallest kmer peak at 4X the frequency of the first observable peak (the former represents the non-variant sequences, while the latter represents variants unique to one haploid genome), with peaks at 2X and 3X as well. Your sample has an unknown number of strains with non-integer representations, and you expect the majority of their genomes to be identical. Your predicted distribution of kmer frequencies would be one large peak corresponding to your depth of coverage, with no obvious peaks at lower coverage.

Perhaps the best approach would be to align the data to an E. coli reference with stringent parameters (e.g., minimal mismatches and gaps) to remove the bulk of the data, then try to assemble the unaligned reads with a metagenome assembler. Note that I have not attempted such an approach, so it may not work.

**maxsalm** · 12-04-2015, 06:02 AM

Hi there! You could try the CORTEX assembler: http://www.ncbi.nlm.nih.gov/pubmed/23172865

**Brian Bushnell** · 12-04-2015, 10:29 AM

The best approach would probably be to sequence and assemble with very long reads (PacBio) in order to separate the strains. Short reads are not very good for phasing.

**rhinoceros** · 12-07-2015, 06:26 AM

Give some metagenome assembler like IDBA-UD a try. It takes into account read coverage. Chances are high that your strains were not present in the isolated DNA in equal abundance. Also set very strict threshold for similarity, e.g. 99%.

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 29 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 216 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

Genome assembly of mixed E. coli samples

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News