Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use Gene Annotation Model in Tophat2

    I'm using Galaxy to analyze RNAseq results for differential gene expression in Drosophila mel.

    In tophat2, it says that I can supply a gene annotation model by uploading the file.

    Where do I get the file? Obviously, drosophila melanogastor has an annotated genome, but when I go to download it, I don't know what I'm looking for. There are so many files. (http://flybase.org/static_pages/docs/datafiles.html#)

    There's genes, insertions, alleles... is it all of them? or is there a complete file?

  • #2
    You get the annotations from a GTF file. An example file for fruit fly can be found here: ftp://ftp.ensembl.org/pub/release-77...a_melanogaster. You want to make sure that the GTF file corresponds to the genome build you are using for alignments.

    BTW: Are you using a custom local mirror of galaxy since I don't see an option to supply a GTF file at http://usegalaxy.org (PSU public galaxy).

    Comment


    • #3
      Thanks!

      Is there a way to import that from Biomart in Galaxy? I clicked through to drosophila, but then the options stop...

      Is it a custom local mirror of galaxy?
      Maybe? There are certain features that are different on the galaxy I use than the usegalaxy.org. And I have a specific login that is not an email.

      It's an option in Tophat2 to use a gene annotation model, but the tutorial I'm following doesn't say where they found one. It doesn't say which type of file it requires.

      Comment


      • #4
        BioMart does not export results as GTF files. You can probably do that using the UCSC Main table browser in Galaxy.

        See the explanation on how to use annotation with TopHat2 on this manual page: http://ccb.jhu.edu/software/tophat/manual.shtml (look for the --GTF option). TopHat uses a GFF3/GTF format file.

        Comment


        • #5
          Thanks for the help!

          I was able to use galaxy to get the annotation by:

          In galaxy, right clicking on UCSC main table (opening a new tab/directly clicking didn't do anything), and then putting in the info for dmel and sending it to galaxy.

          Comment


          • #6
            You want to be careful in making sure that the GTF data from UCSC is for the same genome build that you used for rest of the analysis. Otherwise annotations could be inaccurate.

            Comment


            • #7
              The reference genome is: dmel 2006 BDGP R5/dm3 (dm3)
              The UCSC assembly is dm3.

              I'm assuming those are the same? "dm3" being the base/build they're using for both?

              Comment


              • #8
                That is correct.

                I had mainly added that note for the benefit of others, who may find this thread by searching, as something to be aware of.

                Comment


                • #9
                  I had mainly added that note for the benefit of others, who may find this thread by searching, as something to be aware of.
                  That was actually why I included the info.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM
                  • seqadmin
                    Non-Coding RNA Research and Technologies
                    by seqadmin




                    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                    Nobel Prize for MicroRNA Discovery
                    This week,...
                    10-07-2024, 08:07 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-24-2024, 06:58 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-23-2024, 08:43 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X