Unconfigured Ad

**Bukowski** · 04-21-2010, 02:25 AM

Originally posted by 454andSolid View Post

Hi all,

We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:

- Are there any reports on how common these errors are (especially in coding regions)?

- How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and using these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?

thanks
/Jakub

I have to say at the time of answering, I've been looking for solutions to this with SOLiD data to correct 454 homopolymer errors, and come up short. I know there are some people working on this, but with the NGS workflow focused on resequencing and SNP detection, the finishing of denovo 454 assemblies with additional data, especially from SOLiD runs, seems to be a sadly neglected area.

I'd be delighted to hear otherwise from someone..

**colindaven** · 04-21-2010, 03:15 AM

There are a couple of other messages on this forum about this. Also several papers are out there too, using Pubmed should get you some good information.

Using solexa to correct 454 homopolymer errors - SEQanswers

http://seqanswers.com/forums/showthread.php?t=3635&highlight=homopolymer

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

As far as I know, the only implemented script is the one mentioned here by Torst.

**Torst** · 04-24-2010, 07:59 PM

Originally posted by 454andSolid View Post

We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:
- Are there any reports on how common these errors are (especially in coding regions)?
- How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and use these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?

The homopolymer errors can occur wherever the true sequence has about three or more of the same bases in a row. If this happens more in coding regions, then they will be affected more. It's genome dependent. In bacteria, which are coding-dense, this means all homopolymer errors result in frame-shifts in genes :-(

We use Illumina and SOLiD short reads to correct 454 scaffolds produced by gsAssembler/Newbler. We don't correct the reads themselves, rather the contigs or scaffolds that are assembled by gsAssembler.

As colindaven said, I explain on this thread http://seqanswers.com/forums/showthread.php?t=3635 how our software Nesoni could be used for this purpose. The key is using a read mapper which is good at detecting INDELs - detecting SNPs is not much use in fixing homopolymer errors.

**454andSolid** · 05-02-2010, 08:40 AM

I will try using Nesoni with our transcriptome data.

Thanks for the advice!

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

correcting homopolymer run errors

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News