Unconfigured Ad

**Torst** · 04-24-2010, 09:39 PM

Originally posted by saima View Post

while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

It depends on the size and complexity of your genome. See http://www.ncbi.nlm.nih.gov/pubmed/20064230

If you are working on a human or related genome, this is a good guide: http://www.genome.gov/10001812

**saima** · 04-25-2010, 11:26 PM

Hi Torst,
thanks for your reply, but my genome is a bacterial genome, streptococcus agalactie. can you guide me fot it.

thanks

**Torst** · 04-26-2010, 06:35 PM

Originally posted by saima View Post

thanks for your reply, but my genome is a bacterial genome, streptococcus agalactie. can you guide me fot it. thanks

A genome is never finished until it is closed to to its constituent chromosomes and plasmids. It all depends on what purpose you have for it? There are already three S.ag genomes closed genomes in Genbank (2603, A909, NEM316). If you are only interested in SNPs, there is no need to close/finish the genome. If large scale structure and repeat distribution is important, then you'll need to do more Sanger sequencing / PCRs to disambiguate.

**Torst** · 04-26-2010, 06:37 PM

Originally posted by saima View Post

if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

By any chance was this done with 454 sequencing? The Newbler assembler tends to put in VERY LONG runs of "N"s where it thinks contigs are joined, but in practice there isn't really a 40,000 N gap.

**kmcarr** · 04-27-2010, 04:46 AM

Originally posted by saima View Post

Hi,

while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

Thanks

It's been observed for many species of bacteria that there exists a "pan-genome", which includes the entirety of the genomic content for a the species, but that any single strain will contain only a subset of this. There is always a core genome, which every strain will share. Other sections of the genome may be present or absent. These optional parts of the genome may provide optional phenotypes, e.g. virulence factors, alternate metabolic potential, etc.

If the strain you are sequencing is different than the strain you are using for a reference sequence it is quite possible that these large gaps represent sequence elements that are simply absent from your strain. Among the three completed genomes of S. agalactie the difference is genome size between the largest and smallest is > 80,000 bp. Is there a large number of reads which you could not map to the reference genome? If so these may represent sequence elements present in your strain which are absent in the reference. I there is a large pile of unmapped reads try performing a de novo assembly of these.

**Torst** · 04-28-2010, 06:33 PM

If the strain you are sequencing is different than the strain you are using for a reference sequence it is quite possible that these large gaps represent sequence elements that are simply absent from your strain. Among the three completed genomes of S. agalactie the difference is genome size between the largest and smallest is > 80,000 bp. Is there a large number of reads which you could not map to the reference genome? If so these may represent sequence elements present in your strain which are absent in the reference. I there is a large pile of unmapped reads try performing a de novo assembly of these.

I have to apologise, I mis-read the original post which stated 'mapping assembly' rather than 'de novo assembly'.

I agree with everything kmcarr has said. 80kbp difference between strains is not unusual - in fact it is what makes it a "different strain" (could argue on taxonomy for hours of course). A mapping assembly needs to treated carefully - if you allowed reads to map to multiple places, and allowed partial reads to map (local alignment) etc, some inferences could be faulty.

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

How many gaps???

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News