Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • correcting homopolymer run errors

    Hi all,

    We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:

    - Are there any reports on how common these errors are (especially in coding regions)?

    - How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and use these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?

    thanks
    /Jakub
    Last edited by 454andSolid; 04-21-2010, 02:36 AM.

  • #2
    Originally posted by 454andSolid View Post
    Hi all,

    We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:

    - Are there any reports on how common these errors are (especially in coding regions)?

    - How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and using these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?

    thanks
    /Jakub
    I have to say at the time of answering, I've been looking for solutions to this with SOLiD data to correct 454 homopolymer errors, and come up short. I know there are some people working on this, but with the NGS workflow focused on resequencing and SNP detection, the finishing of denovo 454 assemblies with additional data, especially from SOLiD runs, seems to be a sadly neglected area.

    I'd be delighted to hear otherwise from someone..

    Comment


    • #3
      There are a couple of other messages on this forum about this. Also several papers are out there too, using Pubmed should get you some good information.

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      As far as I know, the only implemented script is the one mentioned here by Torst.

      Comment


      • #4
        Originally posted by 454andSolid View Post
        We have been running de novo assembly of a eukaryotic genome, using 454 titanium together with gsAssembler. When we compare our assembly with cloned cDNA fragments (sequenced with Sanger) we find some homopolymer errors. So we were wondering:
        - Are there any reports on how common these errors are (especially in coding regions)?
        - How have people dealt with these problems? We were thinking about running Illumina or SOLiD (which would give us 50-100x coverage) and use these data to correct the homopolymer run errors. Do you know of any programs or papers that might help?
        The homopolymer errors can occur wherever the true sequence has about three or more of the same bases in a row. If this happens more in coding regions, then they will be affected more. It's genome dependent. In bacteria, which are coding-dense, this means all homopolymer errors result in frame-shifts in genes :-(

        We use Illumina and SOLiD short reads to correct 454 scaffolds produced by gsAssembler/Newbler. We don't correct the reads themselves, rather the contigs or scaffolds that are assembled by gsAssembler.

        As colindaven said, I explain on this thread http://seqanswers.com/forums/showthread.php?t=3635 how our software Nesoni could be used for this purpose. The key is using a read mapper which is good at detecting INDELs - detecting SNPs is not much use in fixing homopolymer errors.

        Comment


        • #5
          I will try using Nesoni with our transcriptome data.

          Thanks for the advice!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 07-19-2024, 07:20 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-16-2024, 05:49 AM
          0 responses
          46 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-15-2024, 06:53 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Working...
          X