Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GFF3 annotation file

    Hi All,

    I want to consult everyone how to use this GFF3 annotation file. Since I use bowtie index in which the name of chromosome has been changed as "1","2","3"..., instead of "chr1","chr2","chr3"..., therefore I could not upload the junction to UCSC since the name is case sensitive.

    I just read the tophat manual providing TopHat with an annotation file. But I don't know how to use this annotation file. I just simply run "--solexa1.3-quals", then got the result. Should I use this file before running this command?
    Can some experienced SEQers give me some hints?

    Really appreciate your help

  • #2
    This depends on how you want to treat your data. Giving TopHat the annotation file will force it look for the junctions contained therein even if it would not have considered them otherwise. There is a gtf2gff3 script available online (google the term) that you can use to make a GFF3 file for hg18 from the hg18 knownGenes table (which is downloadable in GTF format).

    HTH,

    Shurjo

    Comment


    • #3
      Hi shurjo,

      Thanks your reply. I already have the GFF3 file of mouse Mus_musculus.NCBIM37.56.gff3. But still have no clue when I should use this GFF file, before or after tophat running? sorry I am a bit confused.

      Many thanks!

      Comment


      • #4
        I am not sure what exactly you want, but if you:

        1) want to use a GFF file to find out about gene-expression, then tophat since version 1.0.12 says: "TopHat no longer calculates gene expression. Users interested in expression calculations should consider using Cufflinks for gene- and isoform-level expression calculations."

        or

        2) want to provide your own junctions, then search the manual for "Supplying your own junctions" and you'll see the "-G/--GFF <GFF3 file>" flag explained

        svl

        Comment


        • #5
          Neither before nor after but during the TopHat run :-). Use it with the -G option to Tophat

          Like so:

          tophat --mate-inner-dist 240 --mate-std-dev 25 ~/bin/bowtie/bowtie-0.12.1/indexes/hg18_inclusive 108971.read1.fa 108971.read2.fa -m 2 -p 4 -G /home/sensh/pipeline_test/GFF3/UCSC_knowngenes_hg18_tweaked.gff3

          Comment


          • #6
            Thanks Shurjo and svl!

            I just want to provide my own junctions. Therefore I should write (I put data file: bic.txt, and index file as well as GFF3 file in the same folder):

            tophat --solexa1.3-quals Mus_musculus.NCBIM37.56 bic.txt -G mus_musculus.NCBIM37.56.gff3

            But I got en error: Error: you must set the mean inner distance between mates with -r
            And my data is not pair-end data.

            Thanks in advance!

            Comment


            • #7
              Originally posted by Wei-HD View Post
              tophat --solexa1.3-quals Mus_musculus.NCBIM37.56 bic.txt -G mus_musculus.NCBIM37.56.gff3
              Maybe you have to put all options before the index-base and reads. The manual says:

              Usage: tophat [options]* <index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X