Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Minimus2

    Hi,

    I have a question about minumus2. I am using it to join two velvet assemblies of the same Illumina data but produced for the different hash lengths. Each assembly is approximately ~ 32Mb. The resulting minimus2 merged assembly is 2 times the size of the input data.

    Has anyone observed similar problem, what is the source of such size increase and how to solve it? I tried to change overlap value but it did not change much.

    Thanks for your advice!

  • #2
    I would presume (assuming you have fed the contigs to Minimus correctly) the two assemblies are sufficiently different that Minimus struggles to fnid overlapping regions to join together.

    You could use something like MUMMER to check this.

    In any case, I'm not sure that the approach you are taking by mixing the results of two assemblies with different k-mer lengths is likely to result in a better result.

    Comment


    • #3
      I agree with Nick overall in that joining two assemblies using k1 and k2 will probably not gain much UNLESS you had trimmed your reads to variable length, and a stack of your reads were shorter than one of the k values, and hence couldn't be used.

      Minimus2 couldn't join them due to lack of overlap I guess, or maybe you didn't run it correctly. It is a bit confusing - I use a Perl script wrapper which I have attached (it needs BioPerl installed).
      Attached Files

      Comment


      • #4
        Thanks a lot for the advice and thoughts. The idea of merging assemblies of k1 and k2 (for instance kmer 31 and 61) was to get more continuous consensus assembly. But I discovered few problems, minimus can't efficiently deal with Ns. Splitting contigs with Ns contradicts the whole idea of getting longer contigs. Short contigs (abundant in velvet assemblies) are not always merged the way you would expect. Finally, velvet assemblies produced for different kmers do seem to differ a lot (worrying).

        I think I run minimus2 correctly since I tested it on the sample dataset and it worked, in any case thanks for the script, it is very helpful.

        Comment


        • #5
          Did you try changing the program call from make-consensus to make-consensus_poly within the runAmos script? I outlined the change in this thread:



          It seemed to do a better job of handling N's and other ambiguity codes for me.

          This seems to be the only place this program is referenced:

          Comment


          • #6
            I would agree in that the multiple kmer approach has significantly increased the number of full length contigs in our illumina assemblies, and make much more sense than testing for a single optimal kmer. I've been using either cd-hit to cluster the separate runs or cap3 to assemble them. My recent trial of minimus2 gave yields similar to our cd-hit results i.e. reduced dataset by ~1/4. Have you considered using velvet -long for your final assembly?

            Comment


            • #7
              Thanks a lot for the minimus2 thread!

              We tried to use -long velvet option but run into memory problems in our system.

              This might be also a useful tip - we discovered many overlapping contigs within a single velvet assembly that have an overlap shorter than a kmer and therefore are not merged by velvet. Currently, we are trying to merge such contigs...

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X