Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

de novo assembly of transcriptomes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembly of transcriptomes

    Hi All,

    It strikes me that most discussions about assembly -- and perhaps even most assembly software -- are centered around assembly of genomes. So I wonder about what special challenges might arise in transcriptome assembly.

    Specifically,
    (1) What level of coverage should one expect to give a reasonably complete picture of at least a part of all transcripts in the sample?

    (2) Are there any thoughts on tuning assembly settings for this application?

    ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).
    Last edited by EMeyer; 07-17-2008, 11:45 AM.

  • #2
    Originally posted by EMeyer View Post
    It strikes me that most discussions about assembly -- and perhaps even most assembly software -- are centered around assembly of genomes. So I wonder about what special challenges might arise in transcriptome assembly.
    ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).
    Velvet has now been forked into "Oases" which modifies the model to handle transcripts of different coverages.

    Roche's 454 gsAssembler 2.0 now has a "cDNA" check-box to do similar for 454 reads.

    Comment


    • #3
      Torst - thanks, that's interesting.

      We saw poor results for bacterial transcriptome assembly in contrast to reference based mapping - we had a known genome.
      The problem is that low coverage transcripts are still very interesting and these weren't picked up by the assembly programs.

      Comment


      • #4
        Originally posted by colindaven View Post
        Torst - thanks, that's interesting.

        We saw poor results for bacterial transcriptome assembly in contrast to reference based mapping - we had a known genome.
        The problem is that low coverage transcripts are still very interesting and these weren't picked up by the assembly programs.
        Oh, bacterial transcriptome. Cool.

        Have a look at MIRA. A quick assembly with "--job=denovo,est,accurate,454" should tell you whether you like the results or not.

        As I'm the author of that beast, I'd be interested how bacterial transcriptomes look like: do they also contain a couple of highly represented genes (sometimes in the thousands of reads per gene)? If not, then "--job=denovo,genome,accurate,454" migh also be an option (with or without -CLec)

        Regards,
        B.

        Comment


        • #5
          Hi BaCH,

          we used Illumina sequencing.
          15m 36 bp reads.

          95% rRNA, which was excluded.
          >1000 expressed genes

          Some genes/operons were of course highly expressed, but like I say, lowly expressed genes were also very interesting given arrays cannot detect them reliably.

          The de novo assembler - Velvet - we used, which I rate highly, output <100 genes. That's a big difference between reference and de novo. Overall we were very happy with the accuracy and biological relevance of our results.

          Comment


          • #6
            Originally posted by colindaven View Post
            we used Illumina sequencing.
            15m 36 bp reads.
            I generally do not recommend de-novo assembly with 36mers (nor with 50mers btw.). With bacterial transcriptome however this could work.

            B

            Comment


            • #7
              I think one of the challenges in transcriptome assembly is to correctly reconstruct paralogous genes so that one doesn't end up creating chimeric transcripts. This is specially troublesome for closely related paralogies, which in many occasions are biologically the most interesting ones. Eg, hemoglobins:

              http://www.ensembl.org/Homo_sapiens/...274420-5667019

              With the lack of a genomic assembly to anchor the reads using the more divergent non-exonic regions, the proper analysis of paralogues will be a great feature.

              Velvet/Oases and another method from Broad presented at CSHL-BG2010 are two example methods that try to deal with the different "paths" in a transcript graph, that would represent the different paralogies of a single gene family.

              Originally posted by EMeyer View Post
              Hi All,

              It strikes me that most discussions about assembly -- and perhaps even most assembly software -- are centered around assembly of genomes. So I wonder about what special challenges might arise in transcriptome assembly.

              Specifically,
              (1) What level of coverage should one expect to give a reasonably complete picture of at least a part of all transcripts in the sample?

              (2) Are there any thoughts on tuning assembly settings for this application?

              ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).

              Comment


              • #8
                Originally posted by EMeyer View Post
                Hi All,

                Specifically,
                (1) What level of coverage should one expect to give a reasonably complete picture of at least a part of all transcripts in the sample?

                (2) Are there any thoughts on tuning assembly settings for this application?

                ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).
                Have a look here:
                http://www.ncbi.nlm.nih.gov/pubmed/?...ome+sequencing
                We did a 454 single plate from transcriptome of reptiles and neither Velvet nor MIRA nor Newbler have been used succesfully. We got too many singletons whose mean length was similar to mean length of the reads.
                Regards,
                jordi

                Comment


                • #9
                  Hi everybody,

                  I'd like to assemble a transcriptome (a complex eukaryotic transcriptome)*de-novo and was wondering if somebody that already performed a similar assembly could give me a hint on an appropriate method to try. My dataset is described below. I*was thinking to use mira but I*am not quite sure if I*should first map my reads to the 454 dataset and keep those short reads that do not have a match for the assembly or if I*should feed all my reads together, and with what command.

                  In my first test (only using 454 data) I*used:

                  mira --project=test --job=denovo,genome,normal,454 -AS:urd=no

                  I am starting to read the manual now but I*would like to run different test assemblies while I*read it.

                  Best,

                  Yvan

                  My dataset:
                  ~1.2mio 454Ti reads (average 300bp length, high AT content of ~65%), 40+Mio Illumina directional reads (average length of 50bp) and 20Mio Illumina others non-directional reads (72bp).

                  The base calling of the Illumina reads has been done with Ibis so that they currently are in sanger fastq format.

                  Finally, the reads are all single-end reads.

                  Comment


                  • #10
                    Originally posted by yvan.wenger View Post
                    Hi everybody,
                    In my first test (only using 454 data) I*used:

                    mira --project=test --job=denovo,genome,normal,454 -AS:urd=no
                    Looks OK for a first test. Further parametrisation should be tried only once the results of that are known and analysed (like, whether increase/decrease some thresholds like the one for nasty sequences etc.)

                    Originally posted by yvan.wenger View Post
                    My dataset:
                    ~1.2mio 454Ti reads (average 300bp length, high AT content of ~65%), 40+Mio Illumina directional reads (average length of 50bp) and 20Mio Illumina others non-directional reads (72bp).
                    Ummm, no, I do not think that feeding 60+m reads to MIRA is a good idea. It might work, but chances are good it might not.

                    B.

                    Comment


                    • #11
                      Trans-ABySS attempts to address the challenges of de novo assembly of transcriptome data (such as the occurrence of multiple isoforms per gene):

                      http://www.bcgsc.ca/platform/bioinfo...re/trans-abyss

                      Comment


                      • #12
                        Hi all!
                        I used MIRA to assemble some 454 reads previously annotated by blast. Some of them were assembled in several contigs but others appeared in "contigdebrislist". I searched some bibliography but there's little. Any idea why these reads didn't assemble properly? What's this debrislist??
                        Thanks in advance.

                        Comment


                        • #13
                          You might want to have a look here http://mira-assembler.sourceforge.ne...ect_faq_debris

                          If you have more specific questions regarding MIRA you may also want to subscribe to the MIRA mailing list.

                          Sven

                          Comment

                          Working...
                          X