Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Minimus2

    Hi,

    I have a question about minumus2. I am using it to join two velvet assemblies of the same Illumina data but produced for the different hash lengths. Each assembly is approximately ~ 32Mb. The resulting minimus2 merged assembly is 2 times the size of the input data.

    Has anyone observed similar problem, what is the source of such size increase and how to solve it? I tried to change overlap value but it did not change much.

    Thanks for your advice!

  • #2
    I would presume (assuming you have fed the contigs to Minimus correctly) the two assemblies are sufficiently different that Minimus struggles to fnid overlapping regions to join together.

    You could use something like MUMMER to check this.

    In any case, I'm not sure that the approach you are taking by mixing the results of two assemblies with different k-mer lengths is likely to result in a better result.

    Comment


    • #3
      I agree with Nick overall in that joining two assemblies using k1 and k2 will probably not gain much UNLESS you had trimmed your reads to variable length, and a stack of your reads were shorter than one of the k values, and hence couldn't be used.

      Minimus2 couldn't join them due to lack of overlap I guess, or maybe you didn't run it correctly. It is a bit confusing - I use a Perl script wrapper which I have attached (it needs BioPerl installed).
      Attached Files

      Comment


      • #4
        Thanks a lot for the advice and thoughts. The idea of merging assemblies of k1 and k2 (for instance kmer 31 and 61) was to get more continuous consensus assembly. But I discovered few problems, minimus can't efficiently deal with Ns. Splitting contigs with Ns contradicts the whole idea of getting longer contigs. Short contigs (abundant in velvet assemblies) are not always merged the way you would expect. Finally, velvet assemblies produced for different kmers do seem to differ a lot (worrying).

        I think I run minimus2 correctly since I tested it on the sample dataset and it worked, in any case thanks for the script, it is very helpful.

        Comment


        • #5
          Did you try changing the program call from make-consensus to make-consensus_poly within the runAmos script? I outlined the change in this thread:



          It seemed to do a better job of handling N's and other ambiguity codes for me.

          This seems to be the only place this program is referenced:

          Comment


          • #6
            I would agree in that the multiple kmer approach has significantly increased the number of full length contigs in our illumina assemblies, and make much more sense than testing for a single optimal kmer. I've been using either cd-hit to cluster the separate runs or cap3 to assemble them. My recent trial of minimus2 gave yields similar to our cd-hit results i.e. reduced dataset by ~1/4. Have you considered using velvet -long for your final assembly?

            Comment


            • #7
              Thanks a lot for the minimus2 thread!

              We tried to use -long velvet option but run into memory problems in our system.

              This might be also a useful tip - we discovered many overlapping contigs within a single velvet assembly that have an overlap shorter than a kmer and therefore are not merged by velvet. Currently, we are trying to merge such contigs...

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 11:49 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              61 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X