Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat or ABySS for transcriptome analysis?

    I've got a guy here claiming that ABySS is more suitable for transcriptome analysis than TopHat+Cufflinks for an organism with an established assembly and gene catalog. I disagree.

    Both ABySS papers have emphasized the parallelizability of the method; assuredly, parallelization and other computational gymnastics of ABySS are peripheral to the objective of accurate assembly and quantification of transcript levels.

    ABySS was designed to be a de novo genome assembler using DNAseq reads. I cannot see how ABySS can be more biologically suitable/accurate than TopHat+Cufflinks. By design, ABySS cannot leverage the additional information provided by aligning RNAseq reads onto the reference assembly thereby tossing out an entire dimension of information in its transcriptome assembly. In contrast, the TopHat+Cufflinks pipeline can and does make use of a reference genome.

    I thought I'd turn this question over to the SEQanswers community for its collective insight.
    Last edited by Ichinichi; 03-28-2010, 12:15 PM.

  • #2
    Abyss or not to Abyss

    Dear Ichinichi,
    I tried Abyss and was a little disappointed. I wrote a Perl script that chopped a transcript up into overlapping fragments. When I ran Abyss to re-assemble the fragments, it got it wrong and produced several shorter transcripts. I lost faith then but maybe the transcript contained repeats or there was another issue that I was not aware of.

    I also met someone trying to construct a novel bacterial genome using Solid and Illuminer. He was really struggling and I don't think he liked my suggestion of adding in some 454 longer reads. Yes, the genome does provide more information. In the coming weeks, I will receive some 454 transcriptome data and I will combine it with the short read stuff. Assuming Abyss can handle different read lengths/technologies, I will tell you if Abyss can successfully construct Refseq human transcripts.

    Just my pennies worth.

    Kind regards,

    P.

    Comment


    • #3
      Hi p200...(waaaaait a minute...that sounds familiar! If your user name is some sort of convoluted, subtle reference to statistics and wetlab experimentation, I laud you!)

      I think that the crux of my objection to the thesis that ABySS is more suitable for transcriptome analysis is that I cannot accept the implicit corollary that ABySS is as biologically accurate on real-world data withOUT alignment information as Tophat+Cufflinks is accurate WITH alignment information.

      Comment


      • #4
        Originally posted by Ichinichi View Post
        Hi p200...(waaaaait a minute...that sounds familiar! If your user name is some sort of convoluted, subtle reference to statistics and wetlab experimentation, I laud you!)
        Or..... that I just visit France a lot and eat 200 fish a year :-)

        I think that the crux of my objection to the thesis that ABySS is more suitable for transcriptome analysis is that I cannot accept the implicit corollary that ABySS is as biologically accurate on real-world data withOUT alignment information as Tophat+Cufflinks is accurate WITH alignment information.
        Yep, when you have a genome. It seems that the more sources of evidence you include, the more precise your answer will be. Well that is my logic. Couple that to my rough test above, I agree. However, I am sure Abyss is very useful as tool that compliments things like tophat/cufflinks. It will be interesting how Abyss assembles the 454 data when I get it. It could be useful here as tophat likes reads of all the same length and I am guessing the 454 reads my vary in length a little. I am sure Abyss is a great bit of software in some scenarios.

        Comment


        • #5
          Originally posted by poisson200 View Post
          I am sure Abyss is a great bit of software in some scenarios.
          like, for instance, for those of us working without reference genomes...

          Comment


          • #6
            Originally posted by peromhc View Post
            like, for instance, for those of us working without reference genomes...
            if you don't have a reference genome then you should be thinking about performing two biological replicate rounds of DNAseq and use ABySS (or any other seq assembler), as designed, to assemble a quickie reference genome, instead of turning a blind eye and assembling questionable transcriptomes.

            poisson200 already performed an overly simple test by computationally "shredding" a known transcript to make a set of "reads" and found that ABySS cannot adequately assemble those reads into a transcript de novo, even in his ideal case where there is no noise in the "reads". this result suggests that while the de Bruijn graph approach may be suitable for genomic assembly (per Simpson et al., 2009), ABySS is likely unsuitable for transcriptome assembly, especially when there is an existing reference genome assembly available, which brings us back to the original question: whether ABySS is as or more appropriate than TopHat+Cufflinks for transcriptome analysis when there is a reference assembly, to use your emphasis.


            Last edited by ECO; 03-30-2010, 07:52 PM. Reason: Removed unnecessary jab.

            Comment


            • #7
              Originally posted by peromhc View Post
              like, for instance, for those of us working without reference genomes...
              also, i'm of the opinion that this statement is incorrect since none of the validation in Birol et al. (2009) is performed withOUT referring to a reference genome first; none of the validations put forth demonstrate any de novo transcriptome assembly performance of ABySS.

              in section 3 validations are of the form:

              1. look at a UCSC annotation
              2. look at what their visual representation ABySS-Explorer shows
              3. point out patterns that are now obvious in 2 GIVEN 1

              nowhere do the authors show us any patterns that are themselves indicative of real or missing annotations; a pattern visualized by ABySS-Explorer given a UCSC annotation does NOT entail that there is an annotation given a visualized pattern.

              if the visualized pattern were indicative of an annotation, then the ABySS assembly and ABySS-Explorer visualization would have predictive properties. but Birol et al. do not claim, nor show that ABySS+ABySS-Explorer have predictive ability.

              the only instance where the paper takes a step in the right direction is in subsection 3.4, where they briefly present a "novel transcript" by looking at a pattern observed by their ABySS-Explorer. but for some reason, they fail to show quantitative results demonstrating that their prediction is a bona fide transcript! one RT-PCR. we're not even given one RT-PCR.
              Last edited by Ichinichi; 03-30-2010, 03:47 PM.

              Comment


              • #8
                Overly sarcastic or aggressive tones will not be tolerated, while thoughtful, intelligent, and respectful discussion is welcomed. Please be respectful of all forum members at all times.

                Comment


                • #9
                  Originally posted by nilshomer View Post
                  Overly sarcastic or aggressive tones will not be tolerated, while thoughtful, intelligent, and respectful discussion is welcomed. Please be respectful of all forum members at all times.
                  my apologies to the community.

                  Comment


                  • #10
                    Hi all,

                    I'm an author of ABySS. If you'd like any help in using ABySS, the ABySS users' mailing list is quite active:
                    ABySS <[email protected]>

                    I'd be happy to help, poisson200, with your assembly of synthetic reads of a transcript.

                    Aligning short reads to a reference while expecting large gaps can be tricky, although this field is constantly improving. Around a small exon, one short read can contain two large gaps, and aligning becomes even more difficult. The advantage of assembling transcriptome data lies in aligning contigs to the reference, which can be aligned more easily and with more specificity than short reads.

                    Cheers,
                    Shaun

                    Comment


                    • #11
                      Originally posted by sjackman View Post
                      Hi all,

                      I'm an author of ABySS. If you'd like any help in using ABySS, the ABySS users' mailing list is quite active:
                      ABySS <[email protected]>

                      I'd be happy to help, poisson200, with your assembly of synthetic reads of a transcript.

                      Aligning short reads to a reference while expecting large gaps can be tricky, although this field is constantly improving. Around a small exon, one short read can contain two large gaps, and aligning becomes even more difficult. The advantage of assembling transcriptome data lies in aligning contigs to the reference, which can be aligned more easily and with more specificity than short reads.

                      Cheers,
                      Shaun
                      I think peromhc and myself are under the impression that a major intended application area of ABySS per Birol et al. (2009) is in de novo transcriptome assembly; that is, cases where there does NOT exist a reference genome assembly. peromhc's earlier post seems to imply that he/she believes that, in the case where one does not have a reference assembly, one can just assemble nextgen reads of mRNA using ABySS and get an accurate, workable transcriptome, and I agree that this application is what Birol et al. seems to suggest. But this usage of ABySS does not seem to be echoed by your post above. In fact, it seems that by "de novo transcriptome assembly" you mean only that the alignment step is done after read-assembly as opposed to before.

                      If small exons are the only biologically relevant advantage (i.e. aside from the parallelizability, etc.), then Tophat+Cufflinks addresses your first point explicitly by treating gaplessly alignable reads (within a single exon) separately from reads that result in gapped alignments (crossing splice junctions). Furthermore, how small of an exon are we talking about? and then how many exons that small are known to exist in well-annotated genomes?

                      Comment


                      • #12
                        Hi Ichinichi,

                        Assembling transcriptome data can be useful when a reference is not available as well as when one is available. If a reference is not available, de novo assembly is the only option. When a reference genome is available, contigs assembled by de novo assembly are often aligned back to the reference to annotate the reference.

                        Many annotated human exons are shorter than 50 bp -- sorry I don't have an exact number handy. If you Google for
                        "annotated exons are shorter than"
                        the first (only) hit indicates that
                        "It should be noted that more than 10% of EMBL/GenBank annotated exons are shorter than 50 bp, with more than half of these shorter than 30 bp."
                        I haven't verified these numbers myself.

                        Reads of 75 bp can span a 50 bp exon with two gaps. The portion of the read in the next exon in this case will be less than 25 bp, which can be difficult to align uniquely.

                        Cheers,
                        Shaun

                        Comment


                        • #13
                          Originally posted by sjackman View Post
                          Hi Ichinichi,

                          Assembling transcriptome data can be useful when a reference is not available as well as when one is available. If a reference is not available, de novo assembly is the only option. When a reference genome is available, contigs assembled by de novo assembly are often aligned back to the reference to annotate the reference.

                          Many annotated human exons are shorter than 50 bp -- sorry I don't have an exact number handy. If you Google for
                          "annotated exons are shorter than"
                          the first (only) hit indicates that
                          "It should be noted that more than 10% of EMBL/GenBank annotated exons are shorter than 50 bp, with more than half of these shorter than 30 bp."
                          I haven't verified these numbers myself.

                          Reads of 75 bp can span a 50 bp exon with two gaps. The portion of the read in the next exon in this case will be less than 25 bp, which can be difficult to align uniquely.

                          Cheers,
                          Shaun
                          Hi Shaun. Thanks for taking the time to explain.

                          When a reference is not available, DNAseq would give data more appropriate for ABySS and similar next-gen assemblers. According to Simpson et al. (2009), ABySS would output a great assembly, which can support downstream transcriptomic analysis.

                          With respect to short exons, I would be wary of taking their statement at face value; a large proportion of EMBL/GenBank records are predicted models, not experimentally determined nor validated models. The inclusion of predicted models in determining a proportion of exons that are short (<50bp) is misleading and inconclusive since no one has gone and confirmed these instances.

                          To explore further, even ceding the point on short exons, the question at hand is whether de novo transcriptome assembly is better than transcriptome assembly with a reference assembly when one is available. TopHat+Cufflinks will handle short exon cases given its separate, ordered handling of ungapped and gapped alignments. Furthermore, assuming that the reference assembly is not suspect, T+C transcriptome assemblies inherently incorporate spatial organization, whereas de novo transcriptome assembly seems to introduce the uncertainty of orientation and spatial ambiguity unnecessarily.

                          I just feel that to suggest the use of a tool like ABySS for so-called de novo transcriptomic assembly and analysis will cause "casual users" to be mislead into a false sense of security about the robustness of such an application of ABySS and similar next-gen de novo assembly methods.

                          Comment


                          • #14
                            Tophat for colorspace reads?

                            Does current version of Tophat support colorspace reads? I know bowtie does.

                            Comment


                            • #15
                              new version of tophat supports colorspace now
                              http://kevin-gattaca.blogspot.com/

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X