Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie2 vs. TopHat

    hello everyone,
    I have been using Bowtie2 for Illumina 100x2 RNA-Seq datasets. I understand TopHat was built as Bowtie (older version) couldn't do gapped alignment. Now that Bowtie2 does that, what is the status of TopHat usage?
    • Would it be right to align using Bowtie2 and reach Cufflinks directly?
    • What would I be missing if I don't use TopHat but the new Bowtie?


    Kindly advice. I have followed this strategy ->
    Bowtie2 --> SAM/ BAM ---> Cufflinks (with GTF file) ---> transcripts with FPKM

    Till now for the 8 datasets processed, I obtained ~99% alignment with "proper-paired" @ ~85%.
    What am I missing by not using TopHat? Any suggestions or ideas, please..

    ---Bowtie2 STDOUT for one of the datasets ---
    Time loading reference: 00:00:08
    Time loading forward index: 00:00:19
    Time loading mirror index: 00:00:11
    Multiseed full-index search: 15:44:57
    70363764 reads; of these:
    70363764 (100.00%) were paired; of these:
    8612578 (12.24%) aligned concordantly 0 times
    34458617 (48.97%) aligned concordantly exactly 1 time
    27292569 (38.79%) aligned concordantly >1 times
    ----
    8612578 pairs aligned concordantly 0 times; of these:
    5036749 (58.48%) aligned discordantly 1 time
    ----
    3575829 pairs aligned 0 times concordantly or discordantly; of these:
    7151658 mates make up the pairs; of these:
    1238386 (17.32%) aligned 0 times
    2331211 (32.60%) aligned exactly 1 time
    3582061 (50.09%) aligned >1 times
    99.12% overall alignment rate
    Time searching: 15:45:35
    Overall time: 15:45:35
    Last edited by amitm; 03-27-2012, 01:18 AM. Reason: EDIT

  • #2
    With 99.12% alignment rate, there is hardly any room for improvement! Is it a prokaryote?

    In theory, you should use TopHat for RNA-seq because it considers splicing. Bowtie2 does not do gapped alignment in that sense (spliced alignment), although it allows for short gaps. Of course, for simpler organisms with no introns, there is not much point in using TopHat.

    Comment


    • #3
      Originally posted by kopi-o View Post
      With 99.12% alignment rate, there is hardly any room for improvement! Is it a prokaryote?

      In theory, you should use TopHat for RNA-seq because it considers splicing. Bowtie2 does not do gapped alignment in that sense (spliced alignment), although it allows for short gaps. Of course, for simpler organisms with no introns, there is not much point in using TopHat.
      hi,
      na, its human cell line RNA. Yep, thats what I have been thinking but since I am interested in transcript isoform quantification, I would want to ensure the efficacy of the pipeline. I have also visualized the BAM file on IGV, looks fine.



      But what I may be missing out for not using TopHat has been nagging me. I have put up aliignment using TopHat and would compare the two results. Would update if I find any changes in the BAM files.

      thanks

      Comment


      • #4
        I think the main difference between Tophat and Bowtie2 is this:

        Say you have a read that spans two exons.

        With Tophat, that read will be mapped two both exons in the mapping to splice junctions phase.

        With Bowtie2 (--local setting i believe?), that read will be soft trimmed until it maps to only one of the two exons, which ever gives the higher mapping score.

        Someone please correct me if I'm mistaken there.

        Comment


        • #5
          Something is fishy. There is no way you should get that high of alignment with 100x100 human RNA sequencing using bowtie2 unless the library is messed up. The IGV plot you show is highly biased to the 3' exon and in the top sample the exonic regions are not easily distinguished from the introns.

          Comment


          • #6
            Originally posted by Jon_Keats View Post
            Something is fishy. There is no way you should get that high of alignment with 100x100 human RNA sequencing using bowtie2 unless the library is messed up. The IGV plot you show is highly biased to the 3' exon and in the top sample the exonic regions are not easily distinguished from the introns.
            Along these lines, >40% multiply mapped reads is likely one of the problems. Have you looked at read quality-- kmer frequency, etc?

            Comment


            • #7
              If you run bowtie2 in local mode it will absolutely align over 90% of your data.

              As others have mentioned, Tophat was not made because bowtie could not do gapped alignments, it was made because there was no aligner that could align reads to the genome across splice junctions. Tophat does this which is separate from gapped alignments, which Tophat will now also report thanks to bowtie2.

              If you do not use Tophat in your cufflinks pipeline cufflinks will be missing valuable information about how the aligned reads are joining exons (in fact joining transcripts) together. cufflinks was designed to make use of that information. The only way to get bowtie2 to generate those type of alignments would be to align to a transcriptome and then converte the alignments back to genomic coordinates (something that Tophat does as part of its alignment pipeline). Then you'd be missing out on novel alignment information, though.

              If you want to get the best results out of your pipeline it's not 100% alignment you should be going for but for alignments to the genome that include spliced alignments. Those alignmets are the most powerful thing for assembling transcripts and for providing evidence of new exons and alternative splicing. For example you might see coverage that looks like a new exon from bowtie2 but only with Tophat would you also be able to see if reads aligning to that new exon also have junctions with annotated exons from a nearby gene.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment


              • #8
                So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
                I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?

                Carmen

                Comment


                • #9
                  I don't think it ever did a transcriptome alignment stage back then. I was never entirely sure what including the GTF was doing back then because of that. I think they looked at it as a guide to help resolve messy/unclear junction conditions.
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment


                  • #10
                    Hmmmm.... So I'm guessing it used it to validate the potential junctions it had found in its initial mapping to the genome?

                    But then it would never find new stuff :/

                    Or maybe it looked for junctions close enough to what it had found to correct those to "perfection"... ?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 11:49 AM
                    0 responses
                    15 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    61 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X