Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wariobrega
    Member
    • Jul 2012
    • 11

    GTF usage in Tophat

    I am trying to use Tophat to find novel splicing junction on a zebrafish RNAseq done with the Illumina CAGE-protocol. I am quite novel to the usage of tophat, and I am making several trials to find the best options combination for my samples, yet I don't completely understand the -GTF (paired with the --transcriptome-index options).

    as stated in the Tophat manual for the --GTF option:
    Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.
    Please note that the values in the first column of the provided GTF/GFF file (column which indicates the chromosome or contig on which the feature is located), must match the name of the reference sequence in the Bowtie index you are using with TopHat. You can get a list of the sequence names in a Bowtie index by typing:

    bowtie-inspect --names your_index

    So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.
    As far as I understand, tophat use a GTF file to build an index (if the gtf file matches with the bowtie index in terms of position and sequence). this Index can be re-used sing the -transcriptome-index option.

    After that, TH aligns the reads against this "GTF index", and discards all the reads that perfectly matches this index, focusing on the reads that not align to the index to find new splicing sites. Is this correct? If it is, then two questions raise up:

    1) will the reads be aligned against this GTF index without even try to splice them? The perfect match happens before the splicing algorithm?

    2) which reference will be better to use with this option? a reference genome or a reference trascriptome? And why?

    thanks for your answers!

    Daniele
  • glados
    Member
    • Mar 2012
    • 59

    #2
    As I've understood it, tophat first uses the information in the annotation gtf to map all the reads that match to all the known genes. After that you'll be left with a bunch of reads that did not match known genes, they will be mapped as usual to the genome. Possibly they represent novel genes or something else. You have to use a reference gene model for this, i.e. the known transciptome, the genome you are suppose to supply to tophat in the form of bowtie index.

    Comment

    • wariobrega
      Member
      • Jul 2012
      • 11

      #3
      Originally posted by glados View Post
      As I've understood it, tophat first uses the information in the annotation gtf to map all the reads that match to all the known genes. After that you'll be left with a bunch of reads that did not match known genes, they will be mapped as usual to the genome. Possibly they represent novel genes or something else. You have to use a reference gene model for this, i.e. the known transciptome, the genome you are suppose to supply to tophat in the form of bowtie index.
      Ty Glados, I found out what was not working!

      Comment

      • carmeyeii
        Senior Member
        • Mar 2011
        • 137

        #4
        So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
        I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?

        Comment

        • archana2287
          Junior Member
          • Feb 2015
          • 5

          #5
          Hello everyone

          In tophat manual it is given that

          -T/--transcriptome-only Only align the reads to the transcriptome and report only those mappings as genomic mappings.

          how does it differ from -G . ( As -G do the same , extract the reads mapped against the given transcript present in the GTF file )


          I did mapping in two different ways ..
          Tophat Mapping without -T

          python tophat.py -p 8 -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

          and with -T and -G ,

          python tophat.py -p 8 -T -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

          I got the difference in FPKM values . How running tophat with first command differ from the second one??

          Comment

          • westerman
            Rick Westerman
            • Jun 2008
            • 1104

            #6
            Let's look at the manual about the '-G' option

            Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.
            Compare to '-T'

            Only align the reads to the transcriptome and report only those mappings as genomic mappings.
            I hope that it is obvious that the two map reads in different ways. The first should be a super-set of the second.

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 11:08 AM
            0 responses
            6 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            53 views
            0 reactions
            Last Post SEQadmin2  
            Working...