Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Microbiome assembly and its normlization

    Hello,

    I am working on 24 Human gut microbiome samples generated via illiumina hiseq, almost 800 gb data.
    I have optimized and assembled the data using velvet but on different K mer, having assembly size; varying from 35 to 115mb. My query is there any need of justification to assemble all data on different k-mers. Is it ok or i should go for only one K-mer based assembly, also how do I proceed to assembly normalization.

  • #2
    For your assembly, you meant you pulled all data together and then assembled using velvet? I think this paper may be a good reference

    Comment


    • #3
      My suggestion is to try slew of different assemblers and different k-mer values.

      In terms of genome assembly, the larger the k-mer the better the assembly as you are reducing redundancy in the sequence and increase coverage.

      Now at the same time, it all depends on how much depth of coverage you want. If you have smaller k-mer value, then you will have more depth of coverage, but at the same time higher chance of mis-assembly of the reads.

      So to over come this, I would suggest then taking all your assembled reads from the different k-mers and try to re-assemble them to form "super contigs".

      This will help you achieve greater coverage (or reduce coverage to the specific genome by removing the same region that has been assembled multiple times) and reduce and bias that each assembler that might have.

      Hope this helps a bit.

      -Zapages

      Comment


      • #4
        Yes....
        thanks for great help!!!!!!

        Comment


        • #5
          IMO there are several newer assemblers out that typically outperform velvet in terms of both accuracy and speed. I typically use SPAdes for bacterial de novo assembly:


          see these as well for reference:

          De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.

          Comment


          • #6
            I've actually used Minia for metagenome assembly with good success.

            Minia
            http://minia.genouest.org/

            Determining an optimal kmer size for a metagenome is tough. My suggestion would be to try several.

            KmerGenie
            http://arxiv.org/pdf/1304.5665.pdf

            Comment


            • #7
              Originally posted by fanli View Post
              IMO there are several newer assemblers out that typically outperform velvet in terms of both accuracy and speed. I typically use SPAdes for bacterial de novo assembly:


              see these as well for reference:

              http://journals.plos.org/plosone/art...l.pone.0107014
              Hi fanli,
              All these seems to be genome assemblers and trained on genomics data I think.

              Comment


              • #8
                I have to say, Spades is slow even on a single microbe; I doubt you could run it on 800Gbp of metagenomic reads.

                We had been using Soap and sometimes Ray for our metagenomes, but now are using Megahit which is faster and uses less memory than Soap.

                also how do I proceed to assembly normalization
                Can you clarify? I have written a normalization program to reduce high-depth reads prior to assembly, but I'm not sure that's what you are looking for.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:58 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-06-2024, 08:18 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-06-2024, 08:04 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-03-2024, 06:55 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Working...
                X