Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending contigs by remapping

    Hello guys, I have a really crappy denovo assembly and would like to know what is the best way to extend my contigs.

    Currently I am progressing very slowly by using geneious to assemble to reference my illumina data on top of my existing contigs.

    I know there is a consed script called addSolexaReads.perl that will create an ace file that has a reference sequence and reads aligned to it, the problem is, is that it does not generate a new consensus based on the reads that have been mapped to it.

    What I am interested are the edges of the contigs and how could the solexa data that I have extend these edges further until 2 ends hit each other and create 1 contig by closing a gap.

    I would like to make this an itterative process somehow.


    Thank you.

  • #2
    I've tried that to, but really, if reads could bridge those gaps, your assembler probably would have bridged them. Some of your contigs will be flanked by repetative sequence, so simply read-walking off the edges won't help that.

    You could try using paired end data to see which contigs ought to be adjacent to each other.

    Comment


    • #3
      Not exactly what I was looking for.

      I was able to readwalk the contig ends and extend the contig by about 3 kb, I am sure it can go even further if I keep doing it, just need an itterative process.

      Comment


      • #4
        Pagit

        May you would like to take a look at this:

        Wellcome Sanger Institute tools directory


        When doing a mapping assembly, most of the gaps will correspond to large indels or edges of inverted and/or translocated fragments. Those gaps are not necessarily repeated or impossible to assemble sequence but rather the rearrangements mentioned above. So using contig-edge mapping of the reads in order to close such gaps makes sense. A denovo assembler should be able to do it.

        However, if this contigs were obtained from a denovo assembly, then there is not much left to do. In the package above there is a software called IMAGE2 which claims to be able to extend contigs by mapping redas to its edge, but I have nor tried it myself.

        Comment


        • #5
          IMAGE is pretty much what you're looking for. Sometimes it helps, sometimes it does nothing at all and sometimes it will insert errors.

          Comment


          • #6
            but how can you tell the difference? if you do not have a reference to compare it to?

            Comment


            • #7
              it was a variant searching project so some Sanger sequencing was used for verification of variants. It might be worth mentioning that a coworker ran IMAGE that time so I'm not sure what parameters he used. I've worked on that same dataset and then used IMAGE and improved results for the assembly (at least in those same regions where we had errors)

              Edit: So pretty much what I would suggest is mapping your reads back to your assembly or do something like that to verify that your gap closure is supported.

              Comment


              • #8
                Originally posted by AdrianP View Post
                I have a really crappy denovo assembly
                Maybe you should address what made your assembly less than what you expected in the first place. Not enough data/noisy data/redundancy/artifacts etc.

                The problem with iteratively extending contigs, is while in the first iteration the mappings are unique, in the second iteration, the new regions were probably not assembled in the first place because they were repetitive, degenerate or low covered, so when you go to map to those regions, you will get bogus results.

                IMAGE2 is different in that IMAGE2 searches for miss-assembled regions which weren't covered, or have many pairs mapping to another contig, then breaks the contigs so that the pieces can be rescaffolded, potentially to different locations.

                Comment


                • #9
                  Dear Friends,

                  Thank you for replying. I indeed had a very tricky situation which I have now resolved completely and willing to share my expirience.

                  I needed to assemble the mitochondrial genome of a species, not knowing whether it is linear or circular. My starting material were 3 contigs that blasted mitochondrial genes. Further more i had some contigs that blasted mitochondrial genes and nuclear genes (which likely means that these are mitochondrial genes in the nucleus and are not part of the mtDNA map).

                  So I did read walking on these 3 contigs and eventually was able to obtain a circular contig. The issue with this contig, is that the coverage for it is just unbelievable horrific! nothing is consistent. I cannot explain it, i need to attach a picture.



                  Figure 1: A) The coverage of the genome is displayed as a graph . B) PCR products designed are displayed in alternating colors. C) Regions of the genome confirmed by sanger sequencing of the PCR products shown. Numbers in teal show parts that were not covered by sanger sequencing (however, there is plenty of coverage with illumina reads). Sanger sequencing covers 84% of the mitochondrial genome sequence. D) The gel showing the presence and size of PCR products shown in B).


                  What do you guys think?
                  Last edited by AdrianP; 03-07-2012, 02:26 PM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Quality Control Essentials for Next-Generation Sequencing Workflows
                    by seqadmin




                    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                    Nucleic Acid Quality Control
                    Preparing for NGS starts with isolating the...
                    02-10-2025, 01:58 PM
                  • seqadmin
                    An Introduction to the Technologies Transforming Precision Medicine
                    by seqadmin


                    In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
                    01-27-2025, 07:46 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 02-07-2025, 09:30 AM
                  0 responses
                  65 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-05-2025, 10:34 AM
                  0 responses
                  101 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-03-2025, 09:07 AM
                  0 responses
                  81 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 01-31-2025, 08:31 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X