Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembly of transcriptomes

    Hi All,

    It strikes me that most discussions about assembly -- and perhaps even most assembly software -- are centered around assembly of genomes. So I wonder about what special challenges might arise in transcriptome assembly.

    Specifically,
    (1) What level of coverage should one expect to give a reasonably complete picture of at least a part of all transcripts in the sample?

    (2) Are there any thoughts on tuning assembly settings for this application?

    ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).
    Last edited by EMeyer; 07-17-2008, 11:45 AM.

  • #2
    Originally posted by EMeyer View Post
    It strikes me that most discussions about assembly -- and perhaps even most assembly software -- are centered around assembly of genomes. So I wonder about what special challenges might arise in transcriptome assembly.
    ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).
    Velvet has now been forked into "Oases" which modifies the model to handle transcripts of different coverages.

    Roche's 454 gsAssembler 2.0 now has a "cDNA" check-box to do similar for 454 reads.

    Comment


    • #3
      Torst - thanks, that's interesting.

      We saw poor results for bacterial transcriptome assembly in contrast to reference based mapping - we had a known genome.
      The problem is that low coverage transcripts are still very interesting and these weren't picked up by the assembly programs.

      Comment


      • #4
        Originally posted by colindaven View Post
        Torst - thanks, that's interesting.

        We saw poor results for bacterial transcriptome assembly in contrast to reference based mapping - we had a known genome.
        The problem is that low coverage transcripts are still very interesting and these weren't picked up by the assembly programs.
        Oh, bacterial transcriptome. Cool.

        Have a look at MIRA. A quick assembly with "--job=denovo,est,accurate,454" should tell you whether you like the results or not.

        As I'm the author of that beast, I'd be interested how bacterial transcriptomes look like: do they also contain a couple of highly represented genes (sometimes in the thousands of reads per gene)? If not, then "--job=denovo,genome,accurate,454" migh also be an option (with or without -CLec)

        Regards,
        B.

        Comment


        • #5
          Hi BaCH,

          we used Illumina sequencing.
          15m 36 bp reads.

          95% rRNA, which was excluded.
          >1000 expressed genes

          Some genes/operons were of course highly expressed, but like I say, lowly expressed genes were also very interesting given arrays cannot detect them reliably.

          The de novo assembler - Velvet - we used, which I rate highly, output <100 genes. That's a big difference between reference and de novo. Overall we were very happy with the accuracy and biological relevance of our results.

          Comment


          • #6
            Originally posted by colindaven View Post
            we used Illumina sequencing.
            15m 36 bp reads.
            I generally do not recommend de-novo assembly with 36mers (nor with 50mers btw.). With bacterial transcriptome however this could work.

            B

            Comment


            • #7
              I think one of the challenges in transcriptome assembly is to correctly reconstruct paralogous genes so that one doesn't end up creating chimeric transcripts. This is specially troublesome for closely related paralogies, which in many occasions are biologically the most interesting ones. Eg, hemoglobins:



              With the lack of a genomic assembly to anchor the reads using the more divergent non-exonic regions, the proper analysis of paralogues will be a great feature.

              Velvet/Oases and another method from Broad presented at CSHL-BG2010 are two example methods that try to deal with the different "paths" in a transcript graph, that would represent the different paralogies of a single gene family.

              Originally posted by EMeyer View Post
              Hi All,

              It strikes me that most discussions about assembly -- and perhaps even most assembly software -- are centered around assembly of genomes. So I wonder about what special challenges might arise in transcriptome assembly.

              Specifically,
              (1) What level of coverage should one expect to give a reasonably complete picture of at least a part of all transcripts in the sample?

              (2) Are there any thoughts on tuning assembly settings for this application?

              ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).

              Comment


              • #8
                Originally posted by EMeyer View Post
                Hi All,

                Specifically,
                (1) What level of coverage should one expect to give a reasonably complete picture of at least a part of all transcripts in the sample?

                (2) Are there any thoughts on tuning assembly settings for this application?

                ...any thoughts? I'm currently dealing with a 454 dataset and using Velvet, CAP3, and Euler-SR (although the output from this latter program remains totally unclear to me).
                Have a look here:
                Next generation sequencing technology affords new opportunities in ecological genetics. This paper addresses how an ecological genetics research program focused on a phenotype of interest can quickly move from no genetic resources to having various functional genomic tools. 454 sequencing and its er …

                We did a 454 single plate from transcriptome of reptiles and neither Velvet nor MIRA nor Newbler have been used succesfully. We got too many singletons whose mean length was similar to mean length of the reads.
                Regards,
                jordi

                Comment


                • #9
                  Hi everybody,

                  I'd like to assemble a transcriptome (a complex eukaryotic transcriptome)*de-novo and was wondering if somebody that already performed a similar assembly could give me a hint on an appropriate method to try. My dataset is described below. I*was thinking to use mira but I*am not quite sure if I*should first map my reads to the 454 dataset and keep those short reads that do not have a match for the assembly or if I*should feed all my reads together, and with what command.

                  In my first test (only using 454 data) I*used:

                  mira --project=test --job=denovo,genome,normal,454 -AS:urd=no

                  I am starting to read the manual now but I*would like to run different test assemblies while I*read it.

                  Best,

                  Yvan

                  My dataset:
                  ~1.2mio 454Ti reads (average 300bp length, high AT content of ~65%), 40+Mio Illumina directional reads (average length of 50bp) and 20Mio Illumina others non-directional reads (72bp).

                  The base calling of the Illumina reads has been done with Ibis so that they currently are in sanger fastq format.

                  Finally, the reads are all single-end reads.

                  Comment


                  • #10
                    Originally posted by yvan.wenger View Post
                    Hi everybody,
                    In my first test (only using 454 data) I*used:

                    mira --project=test --job=denovo,genome,normal,454 -AS:urd=no
                    Looks OK for a first test. Further parametrisation should be tried only once the results of that are known and analysed (like, whether increase/decrease some thresholds like the one for nasty sequences etc.)

                    Originally posted by yvan.wenger View Post
                    My dataset:
                    ~1.2mio 454Ti reads (average 300bp length, high AT content of ~65%), 40+Mio Illumina directional reads (average length of 50bp) and 20Mio Illumina others non-directional reads (72bp).
                    Ummm, no, I do not think that feeding 60+m reads to MIRA is a good idea. It might work, but chances are good it might not.

                    B.

                    Comment


                    • #11
                      Trans-ABySS attempts to address the challenges of de novo assembly of transcriptome data (such as the occurrence of multiple isoforms per gene):

                      http://www.bcgsc.ca/platform/bioinfo...re/trans-abyss

                      Comment


                      • #12
                        Hi all!
                        I used MIRA to assemble some 454 reads previously annotated by blast. Some of them were assembled in several contigs but others appeared in "contigdebrislist". I searched some bibliography but there's little. Any idea why these reads didn't assemble properly? What's this debrislist??
                        Thanks in advance.

                        Comment


                        • #13
                          You might want to have a look here http://mira-assembler.sourceforge.ne...ect_faq_debris

                          If you have more specific questions regarding MIRA you may also want to subscribe to the MIRA mailing list.

                          Sven

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          66 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X