Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo transcriptome assembly

    Hi,

    I am new to this firld og NGS data analysis. I have just started working with de novo transcriptome assembly and came across many assemblers available like SOAP de novo, velvet, ABySS etc.

    Which assembler is best to be used for paired end ILLUMINA sequencing data (90bp Reads).

    How can I choose between K-mer lengths?

    Your answers and help would be appreciated.

  • #2
    Try trans-abyss or oases. They are more specialized in assembling transcriptome compared to genome assembler (SOAP de novo, velvet, abyss).

    Comment


    • #3
      Thank you rwenang.

      Can anybody further tell me how to set K-mer lengths for denovo transcriptome assembly and regarding calculation of N50.

      Comment


      • #4
        Hello Niharika,

        I have been doing something similar with paired end Solexa data (75 nt x2). We are using oases, which is part of velvet pipeline. This is what you need to do - (i) do an assembly using velvet and keep read tracking option on, (ii) run oases on the velvet result for transcriptome assembly. These are all explained in oases manual.

        For my data, I played with few different K-mer lengths and settled on K=21 for best N50. You also need to keep the available memory size, etc. in mind, because that limits your ability to experiment with different K-mers. Oases uses lot more RAM than Velvet, and Velvet itself needs lot of memory.

        Good luck,
        Manoj

        P. S.

        1. SOAP denovo is for genome assembly. They cannot do transcriptomes, as far as I know.
        2. ABySS is a parallel version of velvet. So, trans-ABySS is equivalent to OASES. However, I would recommend trying velvet first, because the parallel installation of ABySS requires some more effort.

        ---------------------

        Last edited by samanta; 01-31-2011, 10:47 AM.
        http://homolog.us

        Comment


        • #5
          Originally posted by samanta View Post


          2. ABySS is a parallel version of velvet. So, trans-ABySS is equivalent to OASES.
          To whomever it may concern:

          I am afraid you are obviously wrong here.

          ABySS is not a parallel version of Velvet.




          ABySS paper in Genome Research (2008)
          An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms


          Velvet paper in Genome Research (2009)
          An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms


          Trans-ABySS paper in Nature Methods (2010)

          Comment


          • #6
            I should have said ABySS implements parallel version of de Brujin graph, whereas Velvet is single node de Brujin assembler, but we are splitting hairs here.

            Let's hear from the authors of papers you quoted -


            Velvet paper -

            "We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words"



            Abyss paper -

            "The field of short read de novo assembly developed from pioneering work on de Bruijn graphs by Pevzner et al. (Pevzner and Tang 2001; Pevzner et al. 2001). The de Bruijn graph representation is prevalent in current short read assemblers, with Velvet (Zerbino and Birney 2008), ALLPATHS (Butler et al. 2008), and EULER-SR (Chaisson and Pevzner 2008) all following this approach."

            "To assemble the very large data sets produced by sequencing individual human genomes, we have developed ABySS (Assembly By Short Sequencing). The primary innovation in ABySS is a distributed representation of a de Bruijn graph, which allows parallel computation of the assembly algorithm across a network of commodity computers." [emphasis mine]
            http://homolog.us

            Comment


            • #7
              Originally posted by samanta View Post
              I should have said ABySS implements parallel version of de Brujin graph, whereas Velvet is single node de Brujin assembler, but we are splitting hairs here.
              I agree with you that these two software implement a similar algorithmic approach for the assembly of genomes using de Bruijn graphs.


              But saying that "ABySS is a parallel version of Velvet." is false and undervalues the work done over the years by the numerous researchers in that very field.



              The use of paired-end reads in Velvet is described in a PLoS ONE paper (2009).
              Background Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies. Principal Findings We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly. Conclusions These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.


              For ABySS, I think the contigs are merged according to a threshold on the number of bridging pairs.

              Originally posted by samanta View Post
              Let's hear from the authors of papers you quoted -


              Velvet paper -

              "We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words"
              Precisely !

              The said manipulation of these graphs is what makes Velvet so popular !

              Furthermore, you can get acquainted with Dr. Zerbino's PhD thesis to fully apprehend the concepts he created for manipulating de Bruijn graphs.

              Genome assembly and comparison using de Bruijn graphs
              We train scientists at all levels to get the most out of publicly available biological data.


              The novelty, I think, is the use of long read markers and short read markers.
              (Sections 2.3.4 & 2.3.5 of his thesis)

              Originally posted by samanta View Post
              Abyss paper -

              "The field of short read de novo assembly developed from pioneering work on de Bruijn graphs by Pevzner et al. (Pevzner and Tang 2001; Pevzner et al. 2001). The de Bruijn graph representation is prevalent in current short read assemblers, with Velvet (Zerbino and Birney 2008), ALLPATHS (Butler et al. 2008), and EULER-SR (Chaisson and Pevzner 2008) all following this approach."
              Same thing here. Professor Pavel Pevzner introduced the use of de Bruijn graph in 2001. In the EULER papers, eulerian paths are utilized to manipulate the de Bruijn graph in order to obtain an assembly.


              So this cited paragraph highlights the importance of the de Bruijn graph representation, not how this graph is processed to yield an assembly.

              Originally posted by samanta View Post
              "To assemble the very large data sets produced by sequencing individual human genomes, we have developed ABySS (Assembly By Short Sequencing). The primary innovation in ABySS is a distributed representation of a de Bruijn graph, which allows parallel computation of the assembly algorithm across a network of commodity computers." [emphasis mine]
              I think the true innovation of this paper is not only the distributed de Bruijn graph, but also a working assembler that generates contigs for a human genome.

              Cheers !

              -seb

              Comment


              • #8
                Thank you......fully agree with what you said. I tend to get sloppy in my message board comments.
                http://homolog.us

                Comment


                • #9
                  SOAP denovo has also been used for transcriptome assembly:

                  "De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits"

                  I would also recommend a paper about transAbyss. It explains the functionality of the trans-... addon:
                  "De novo assembly and analysis of RNA-Seq Data"

                  As far as I experienced Abyss is far!!! less demanding regarding memory.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X