Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Genome assembly.

    Comment


    • #17
      Originally posted by kenietz View Post
      Genome assembly.
      Please check the Minia program discussed here. You can assemble a 3Gbase genome using about 6-8GB RAM.



      You can also check the slides posted here -



      If you like to split the reads into parts, the paper by Titus Brown in the first link should help you.

      Please email me (samanta at homolog.us), if you need more explanation of the algorithms, because I do not check the forum frequently. The state of the art is far ahead of Velvet with 512Gb RAM, etc.
      http://homolog.us

      Comment


      • #18
        If I classify the reads into different chromosomes using bwa, can I "de novo"ly assemble the chromosomes in a 64GB machine?

        Comment


        • #19
          Originally posted by ymc View Post
          If I classify the reads into different chromosomes using bwa, can I "de novo"ly assemble the chromosomes in a 64GB machine?
          Interesting question.

          i) For kind of de novo assembly we talk about, the chromosome sequences are not known. If they were known, why would you need de novo assembly in the first place?

          ii) Where chromosomes exist and you are trying to do reassembly, yes it is possible to reduce the RAM requirement by partitioning the reads. However, remember that the RAM requirement for error-free reads is capped no matter how many reads you have. However, in world with errors, RAM requirement goes up linearly with the number of reads.



          iii) If you are trying to do reassembly of human genome using BWA, you are most likely interested in parts of chromosome with indels, etc. Unfortunately, BWA may not be able to capture the reads for those regions and assign to reference chromosome.
          http://homolog.us

          Comment


          • #20
            Originally posted by kenietz View Post
            @SES:
            Thank you for the information. The client wants to try out with 10x at first and then proceed with higher coverage. Yeah, i got it that SGA would probably be able to do the job. Now i am reading about readjoiner. I'm still considering if to take the job at all.

            Btw, what kind of power would i really need to assemble 3Gb genome?
            You can also request soapdenovo2 from BGI. Its RAM requirement is much better than SOAPdenovo, especially when you use k-mer skipping option.
            http://homolog.us

            Comment


            • #21
              Originally posted by samanta View Post
              Interesting question.

              i) For kind of de novo assembly we talk about, the chromosome sequences are not known. If they were known, why would you need de novo assembly in the first place?
              I want to have better variant phasing than GATK's ReadBackedPhasing. Will that route do a better job?

              Comment


              • #22
                Originally posted by ymc View Post
                I want to have better variant phasing than GATK's ReadBackedPhasing. Will that route do a better job?
                In my understanding, that is a different class of problem that none of the solutions suggested above (SGA, diginorm, SOAPdenovo, SGA etc.) is designed to handle. Typical de Bruijn graph-based genome assembly programs are designed to assemble genomes, where none exists. Haplotype difference is a second order issue that those programs are not expected to handle by design. In some situations (long indels), they may assemble two separate contigs for a chromosomal region, but that is fortuitous.

                Of late, people are recognizing a need for algorithms to handle problems of type mentioned by you. Please take a look at the following two papers and check their programs freely distributed at their websites.



                Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinforma …


                The paper mentioned in the following link is not directly relevant to your problem, but could be of help in de novo assembling highly polymorphic genome, where the assumption of no haplotype difference breaks down -

                http://homolog.us

                Comment


                • #23
                  Originally posted by samanta View Post
                  Please check the Minia program discussed here. You can assemble a 3Gbase genome using about 6-8GB RAM.



                  You can also check the slides posted here -



                  If you like to split the reads into parts, the paper by Titus Brown in the first link should help you.

                  Please email me (samanta at homolog.us), if you need more explanation of the algorithms, because I do not check the forum frequently. The state of the art is far ahead of Velvet with 512Gb RAM, etc.
                  This looks very interesting indeed. It's difficult to compare the results directly but I hope this project continues to develop. Thanks for posting.

                  Comment


                  • #24
                    Originally posted by ymc View Post
                    If I classify the reads into different chromosomes using bwa, can I "de novo"ly assemble the chromosomes in a 64GB machine?
                    ymc, I wrote this up on HapCompass algorithm that you may find interesting -

                    http://homolog.us

                    Comment


                    • #25
                      How about Fermi assembler by Heng Li ? Is it not faster and more accurate than SGA or Readjoiner ?

                      Comment


                      • #26
                        Readjoiner Features

                        It was interesting to read article on Readjoiner and notice it has several features as an improvement over SGA. Is Readjoiner MPI compatible. I read it is multithreaded, how good is the scalability ?

                        However, I notice that the tool does not perform well for erroneous reads as you showed in your e.coli data. Is it possible you integrate data cleaner and filters in Readjoiner itself ?


                        Also, on Plantagora metrics it seems that Readjoiner performs worse than SGA! It popped up with more number of insertions and deletions and misassembled contig bases than SGA or Edena!

                        Comment


                        • #27
                          There is also the IDBA assembler:



                          Seems to work pretty well and does not use a lot of memory. I used it for denovo RNA-seq. Good sized transcripts.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-25-2024, 11:49 AM
                          0 responses
                          20 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-24-2024, 08:47 AM
                          0 responses
                          20 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          62 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          61 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X