Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with Tophat

    Hi All-

    I am analyzing RNASeq data for the first time using Tophat and when I run the following command through the terminal the Tophat information/help page pops up and nothing else seems to happen. There is no error/exception displayed and I have no idea what is wrong with my command. Any help would be greatly appreciated. I am following the tutorial by Illumina for single-read data: http://www.illumina.com/documents/pr...ysisTopHat.pdf

    My command:

    --GTF Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf --library-type fr-firststrand --num-threads 1 --output-dir Mouse_output S1.fastq

  • #2
    Are you using the same version of TopHat mentioned in that guide (v.1.4.0)? Current version is v.2.0.13.

    Example from the document you linked.

    Code:
    $ tophat --GTF <iGenomesFolder>/Annotation/Genes/genes.gtf --library-type <LibraryType> --num-threads 1 --output-dir <SampleOutputFolder> <iGenomesFolder>/Sequence/BowtieIndex/genome <SampleID>.fastq
    For names like <something here> you need to provide real names/file or directory paths that you want/are present on your system. This is standard unix convention of specifying variable parts of a command. You are not providing path to location of where you have the bowtie indexes for the genome (which tophat needs).

    So your command would become something like this:

    Code:
    $ tophat --GTF Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf --library-type fr-firststrand --num-threads 1 --output-dir Mouse_output [COLOR="Red"]/path_to/Sequence/BowtieIndex/genome[/COLOR] S1.fastq

    Comment


    • #3
      Thank you for taking the time to help. I am familiar with Unix conventions. I am calling tophat from the directory that contains the fastq file so the file name should be the correct path. I don't think that is the issue, however you raise a good point I am using the newest version of tophat not the version listed in the tutorial so maybe the syntax is outdated?

      Comment


      • #4
        You had omitted providing path to bowtie genome index files (unless you did not copy the entire command in your original post).

        Comment


        • #5
          Hmm, I guess I am not understanding then. I am calling tophat from a terminal window. We have a cluster that I ssh into. I have an account that lives on the server. I am calling tophat from a folder in my account. This folder is named WorkflowFolder. so the full path I suppose would be /home/gkuffel/WorkflowFolder/S1.fastq

          Do I also need to specify this for the reference genome like this: /home/gkuffel/WorkflowFolder/ Mus_musculus/UCSC/mm10/Annotation/Genes/genes.rtf

          Comment


          • #6
            TopHat (and other aligners) require that the genome sequence be indexed in a binary form (burrows-wheeler transform or FM index). This is what you are going to use to compare your data against.

            Since you are using mouse genome you can get pre-made index files/annotation from the iGenomes site: http://support.illumina.com/sequenci...e/igenome.html. Get the build you like (several available for mouse).

            <iGenomesFolder>/Sequence/BowtieIndex/genome - This part of the example command is referring to the genome index files.

            The gtf file only has information about features/annotation for your genome. It has to be used in combination with the actual sequence/index files.

            Comment


            • #7
              That is exactly where I got the genome that I am using. I used the FTP site to transfer the file to my computer and then I used FUGU to get the file to my account on our server that I use to run TopHat.

              Comment


              • #8
                Perhaps you did not paste the entire command in the original post?

                It is unfortunate but the pre-built genome index file is named "genome" (that is the "basename" for the index, there should be multiple files with that basename in BowtieIndex directory). That part (highlighted in blue below) is missing from your original command.

                $ tophat --GTF Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf --library-type fr-firststrand --num-threads 1 --output-dir Mouse_output /path_to/Sequence/BowtieIndex/genome S1.fastq

                Comment


                • #9
                  You need to transfer everything in the "BowtieIndex" directory over to the server.

                  Comment


                  • #10
                    The file from igenomes is:

                    Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf

                    I have transferred the entire folder for Mus_musculus to our server.

                    S1.fastq is my data.

                    I don't think that is the issue my command now is:

                    tophat --GTF home/gkuffel/WorkflowFolder/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf --library-type fr-firststrand --num-threads 1 --output-dir Mouse_output home/gkuffel/WorkflowFolder/S1.fastq

                    Tophat started running this time!!! But then it gave me this error: Expected bowtie2 to be in the same directory with bowtie2-align: /usr/local/share/bowtie2-2.1.0/

                    Exiting now...

                    Comment


                    • #11
                      Irrespective of what the error message says now, TopHat is not going to work right until you provide the location of the genome index files.

                      That is likely to be this: /home/gkuffel/WorkflowFolder/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome

                      Comment


                      • #12
                        I finally understand what you mean. Thanks for your patience. Here is my new command:

                        tophat --GTF home/gkuffel/Workflowfolder/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf --library-type fr-firststrand --num-threads 1 --output-dir Mouse_output home/gkuffel/WorkflowFolder/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome S1.fastq

                        Sorry about the confusion, at least I have that figured out. I am still getting the same error as before though, any thoughts?

                        Comment


                        • #13
                          Did you install tuxedo suite on this machine or someone else did?

                          My feeling is that you are also missing a leading "/" before "home" in both places in your command line. Only way that command would work if you were in the directory that has S1.fastq and the top level directory called "home" is in that directory.

                          Do you see these two programs with these commands?

                          Code:
                          $ which tophat
                          $ which bowtie2

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          47 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X