Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • 454andSolid
    Junior Member
    • May 2009
    • 8

    correcting homopolymer run errors

    Hi all,

    We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:

    - Are there any reports on how common these errors are (especially in coding regions)?

    - How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and use these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?

    thanks
    /Jakub
    Last edited by 454andSolid; 04-21-2010, 02:36 AM.
  • Bukowski
    Senior Member
    • Jan 2010
    • 388

    #2
    Originally posted by 454andSolid View Post
    Hi all,

    We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:

    - Are there any reports on how common these errors are (especially in coding regions)?

    - How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and using these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?

    thanks
    /Jakub
    I have to say at the time of answering, I've been looking for solutions to this with SOLiD data to correct 454 homopolymer errors, and come up short. I know there are some people working on this, but with the NGS workflow focused on resequencing and SNP detection, the finishing of denovo 454 assemblies with additional data, especially from SOLiD runs, seems to be a sadly neglected area.

    I'd be delighted to hear otherwise from someone..

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      There are a couple of other messages on this forum about this. Also several papers are out there too, using Pubmed should get you some good information.

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      As far as I know, the only implemented script is the one mentioned here by Torst.

      Comment

      • Torst
        Senior Member
        • Apr 2008
        • 275

        #4
        Originally posted by 454andSolid View Post
        We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:
        - Are there any reports on how common these errors are (especially in coding regions)?
        - How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and use these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?
        The homopolymer errors can occur wherever the true sequence has about three or more of the same bases in a row. If this happens more in coding regions, then they will be affected more. It's genome dependent. In bacteria, which are coding-dense, this means all homopolymer errors result in frame-shifts in genes :-(

        We use Illumina and SOLiD short reads to correct 454 scaffolds produced by gsAssembler/Newbler. We don't correct the reads themselves, rather the contigs or scaffolds that are assembled by gsAssembler.

        As colindaven said, I explain on this thread http://seqanswers.com/forums/showthread.php?t=3635 how our software Nesoni could be used for this purpose. The key is using a read mapper which is good at detecting INDELs - detecting SNPs is not much use in fixing homopolymer errors.

        Comment

        • 454andSolid
          Junior Member
          • May 2009
          • 8

          #5
          I will try using Nesoni with our transcriptome data.

          Thanks for the advice!

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 07-02-2026, 11:08 AM
          0 responses
          10 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          13 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...