Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging scaffolds from several SOAPdenovo assemblies into a single consensus assembly

    Does anyone have experience with merging scaffolds from a few assemblies into a single consensus assembly? When I run SOAPdenovo with different k cutoffs I get different resulting scaffold characteristics. Higher k values tend to give me a few longer scaffolds with better n50, but overall a lower mean scaffold size and more shorter scaffolds than with smaller values of k. It seems like a good idea to merge assemblies generated with a few of these kmer values into a single consensus assembly, right?

    I have looked at a few tools to do this. One of them is called Reconciliator http://www.genome.umd.edu/reconcilia...structions.htm but it would require me to convert a few SOAPdenovo assembly output files into "Sanger/WashU" format which looks like it could involve quite a bit of work. I am not sure the amount of work required to generate all of these required input files would be worth it?

    I also found minimus2 which looks like it is primarily designed to merge two assemblies at a time, and I have 4 I would like to merge. I could do pairwise merging with the previous consensus, but that seems like it could lead to problems... I am also finding some complaints on forums about how the program deals with N's.

    The last program I found is called MAIA, but it is distributed as a Matlab package with dependencies on a few matlab distributed toolkits (yuck) and it looks like it also requires a "closely related reference genome" which I definitely do not have.

    Thanks for any suggestions or experiences with this.

    -John

  • #2
    Another option is the recently-released Zorro (which is based on minimus2 but makes using NGS data friendlier). However, it is also for pairwise merging.

    Comment


    • #3
      maybe have a look at older threads;

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      and zorro looks also as an interesting tool!
      Boetsie

      Comment


      • #4
        Hi~

        Maybe this TGICL is helpful



        Oben

        Comment


        • #5
          Another tool is GAM;



          haven't tried it yet though

          Comment


          • #6
            Originally posted by boetsie View Post
            Hi,
            GAM was sanger based and currently supports assembly from Arachne and PCAP only. We are working on a NGS version that supports bam files.

            Comment


            • #7
              GAM-NGS was published a few weeks ago:

              Genomic Assemblies Merger for NGS. Contribute to vice87/gam-ngs development by creating an account on GitHub.


              Background In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. Results GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. Conclusions The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.


              It looks very promising; I needed the help from a very good admin to get it installed. For my large datasets however it stopped repeatedly about 20 minutes into one of the last steps.
              Last edited by luc; 07-27-2013, 10:09 AM.

              Comment


              • #8
                Originally posted by luc View Post
                GAM-NGS was published a few weeks ago:

                Genomic Assemblies Merger for NGS. Contribute to vice87/gam-ngs development by creating an account on GitHub.


                Background In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. Results GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. Conclusions The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.


                It looks very promising; I needed the help from a very good admin to get it installed. For my large datasets however it stopped repeatedly about 20 minutes into one of the last steps.

                Please, contact the contact author, he will surely help you!

                About installation, strange, I was easily able to install it myself, even on my home without asking for any system path.

                Comment


                • #9
                  Hi,

                  I have a related question. I have a draft genome (sequenced with PGM) and I want to get the most of the data. Merging different assemblies (different software) is useful?

                  Thanks!

                  Comment


                  • #10
                    Originally posted by mberacochea View Post
                    Hi,

                    I have a related question. I have a draft genome (sequenced with PGM) and I want to get the most of the data. Merging different assemblies (different software) is useful?

                    Thanks!

                    It depends on the assemblies you want to merge. If they are very similar to each other then there is no meaning to merge them. For example, if you run different assemblies with ABySS at different k-mer length, short kmer assemblies are usually subsets of longer kmers assemblies. While, if you are comparing an assembly done with AllPaths-LG and one with ABySS you might get different results and merging them would be very usefull.

                    Comment


                    • #11
                      I thought so, I haven't assembled the genome with other tools. Will do to check.

                      Thanks!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X