Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SHeaph
    Member
    • Nov 2012
    • 13

    Tophat2 Alignment Rate

    Hi,

    I am new to Tophat and I have an alignment rate query.

    I am trying to map RNASeq paired reads (100bp) to a reference genome using Tophat (v2.0.6) and I am returning an alignment rate of 26%. The –solexa-quals option improved the alignment rate slightly, altering other options from the default settings had no affect. I have 34,000 reads and I was expecting a higher rate.

    tophat -p 2 --solexa-quals --b2-very-sensitive Homo_sapiens.GRCh37.67.cDNA_ncRNA UC07i.qfiltered.Bhg19.1.fq UC07i.qfiltered.Bhg19.2.fq

    I used the same reads in Bowtie version 2.0.2 and returned an alignment rate of 90.5%

    bowtie2 -p2 -x Homo_sapiens.GRCh37.67.cDNA_ncRNA -1 UC07i.qfiltered.Bhg19.1.fq -2 UC07i.qfiltered.Bhg19.2.fq -S UC07i.sam --very-sensitive-local

    I understand that Bowtie2 is not appropriate for mapping RNASeq reads as it does not account for RNA splicing and Tophat uses end-to-end alignment.

    Is it possible to improve the alignment rate for Tophat or can I assume that 26% is an appropriate rate?

    The adapter sequences have been removed so I don't think trimming would be useful.

    Thanks in advance!
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    It looks like you're mapping against some subset of the transcriptome. Try mapping against the actual genome with both (I would recommend using a reference GTF file with tophat) and I expect tophat will perform more favourably.

    Comment

    • DunderChief
      Junior Member
      • Aug 2012
      • 6

      #3
      I would be cautious about switching to solexa scores just because you get a higher alignment rate. It seems like you have a problem unrelated to your quality scores and you'll probably end up confusing things further. If you run fastQC, it will automatically determine the version of your quality scores.

      I doubt this would make that big of a difference anyway, but are you sure you're using tophat v2. It depends on how you set it up on your system, but typically the command is tophat2, not tophat.

      Also, how are you determining the alignment rate? When I first started using tophat, I got very confused by their log files. Calculate the % mapped the same way for both tophat2 and bowtie2 results.

      Comment

      • SHeaph
        Member
        • Nov 2012
        • 13

        #4
        Thanks for yer help
        Last edited by SHeaph; 11-20-2012, 01:21 PM.

        Comment

        • SHeaph
          Member
          • Nov 2012
          • 13

          #5
          I was using a cDNA and non-coding RNA library to map the reads against. Can I use this as a GTF? if so would any unmapped reads here then be mapped against the actual genome?

          Much appreciated.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            You can't use the multifasta file as an annotation (since it doesn't actually annotate anything), but since it's name suggests that it's from Ensembl, you might just use the normal Ensembl genome sequence and annotation.

            You can probably save some time by just downloading the premade indices (and I think GTF files, though I don't recall exactly) from here.

            BTW, don't be surprised if mapping things this way leads to slightly lower alignment rates, as the results are going to be both more reliable and easier to analyse downstream (at least for common analyses).

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            26 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            43 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            48 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...