Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending contigs by remapping

    Hello guys, I have a really crappy denovo assembly and would like to know what is the best way to extend my contigs.

    Currently I am progressing very slowly by using geneious to assemble to reference my illumina data on top of my existing contigs.

    I know there is a consed script called addSolexaReads.perl that will create an ace file that has a reference sequence and reads aligned to it, the problem is, is that it does not generate a new consensus based on the reads that have been mapped to it.

    What I am interested are the edges of the contigs and how could the solexa data that I have extend these edges further until 2 ends hit each other and create 1 contig by closing a gap.

    I would like to make this an itterative process somehow.


    Thank you.

  • #2
    I've tried that to, but really, if reads could bridge those gaps, your assembler probably would have bridged them. Some of your contigs will be flanked by repetative sequence, so simply read-walking off the edges won't help that.

    You could try using paired end data to see which contigs ought to be adjacent to each other.

    Comment


    • #3
      Not exactly what I was looking for.

      I was able to readwalk the contig ends and extend the contig by about 3 kb, I am sure it can go even further if I keep doing it, just need an itterative process.

      Comment


      • #4
        Pagit

        May you would like to take a look at this:

        I'm sorry, your page cannot be found on this site.


        When doing a mapping assembly, most of the gaps will correspond to large indels or edges of inverted and/or translocated fragments. Those gaps are not necessarily repeated or impossible to assemble sequence but rather the rearrangements mentioned above. So using contig-edge mapping of the reads in order to close such gaps makes sense. A denovo assembler should be able to do it.

        However, if this contigs were obtained from a denovo assembly, then there is not much left to do. In the package above there is a software called IMAGE2 which claims to be able to extend contigs by mapping redas to its edge, but I have nor tried it myself.

        Comment


        • #5
          IMAGE is pretty much what you're looking for. Sometimes it helps, sometimes it does nothing at all and sometimes it will insert errors.

          Comment


          • #6
            but how can you tell the difference? if you do not have a reference to compare it to?

            Comment


            • #7
              it was a variant searching project so some Sanger sequencing was used for verification of variants. It might be worth mentioning that a coworker ran IMAGE that time so I'm not sure what parameters he used. I've worked on that same dataset and then used IMAGE and improved results for the assembly (at least in those same regions where we had errors)

              Edit: So pretty much what I would suggest is mapping your reads back to your assembly or do something like that to verify that your gap closure is supported.

              Comment


              • #8
                Originally posted by AdrianP View Post
                I have a really crappy denovo assembly
                Maybe you should address what made your assembly less than what you expected in the first place. Not enough data/noisy data/redundancy/artifacts etc.

                The problem with iteratively extending contigs, is while in the first iteration the mappings are unique, in the second iteration, the new regions were probably not assembled in the first place because they were repetitive, degenerate or low covered, so when you go to map to those regions, you will get bogus results.

                IMAGE2 is different in that IMAGE2 searches for miss-assembled regions which weren't covered, or have many pairs mapping to another contig, then breaks the contigs so that the pieces can be rescaffolded, potentially to different locations.

                Comment


                • #9
                  Dear Friends,

                  Thank you for replying. I indeed had a very tricky situation which I have now resolved completely and willing to share my expirience.

                  I needed to assemble the mitochondrial genome of a species, not knowing whether it is linear or circular. My starting material were 3 contigs that blasted mitochondrial genes. Further more i had some contigs that blasted mitochondrial genes and nuclear genes (which likely means that these are mitochondrial genes in the nucleus and are not part of the mtDNA map).

                  So I did read walking on these 3 contigs and eventually was able to obtain a circular contig. The issue with this contig, is that the coverage for it is just unbelievable horrific! nothing is consistent. I cannot explain it, i need to attach a picture.



                  Figure 1: A) The coverage of the genome is displayed as a graph . B) PCR products designed are displayed in alternating colors. C) Regions of the genome confirmed by sanger sequencing of the PCR products shown. Numbers in teal show parts that were not covered by sanger sequencing (however, there is plenty of coverage with illumina reads). Sanger sequencing covers 84% of the mitochondrial genome sequence. D) The gel showing the presence and size of PCR products shown in B).


                  What do you guys think?
                  Last edited by AdrianP; 03-07-2012, 02:26 PM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X