Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • need some help with Tophat

    Hello All,

    I am new to this site and also new to the field of next generation seq.
    I am trying to use Tophat (version 1.0.14) to do RNA-seq mapping. And I found it pretty slow: It took ~15 hours to finish the mapping of ~17 million pairs of reads(75bp) against a chr20 index. My reads are in fasta format and the command I used is:
    "tophat -r 250 chr20 **.lane1_1.fa **.lane1_2.fa"

    I think I installed Tophat and Bowtie properly since they passed the test. But I must have done something silly. I can't imagine how long it's gonna take if I map 400 million reads against the whole human genome.

    Could any one tell what I could do wrong?

    Thank you so much,

    Iris
    Last edited by IrisZhu; 07-29-2010, 01:34 PM.

  • #2
    That is not slow. It sounds about the right time.

    Also, yes 400million reads will need a powerful multi-core computer with a lot of memory.

    For example, it took me about 17 hours to map 130 million (100bp) reads with SpliceMap on 10 cores.

    The current version of SpliceMap is also about the same speed as TopHat. However, the next version will be twice as fast.

    Edit: use the -p option in TopHat if you have a multi-core machine.
    Last edited by john_mu; 07-29-2010, 01:36 PM.
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      John, thanks for your reply.

      This 15 hours for mapping 17 million pairs against a single (short) chromosome ( chr20 ) is reasonable? I am asking just to make sure you was not thinking of the whole genome :-)

      Comment


      • #4
        Originally posted by IrisZhu View Post
        John, thanks for your reply.

        This 15 hours for mapping 17 million pairs against a single (short) chromosome ( chr20 ) is reasonable? I am asking just to make sure you was not thinking of the whole genome :-)
        oh... a single chromosome??? That is uhhh... a bit slow.

        How did you build the index?

        EDIT: also how much free memory do you have? Maybe either bowtie or tophat is thrashing.
        Last edited by john_mu; 07-29-2010, 07:57 PM.
        SpliceMap: De novo detection of splice junctions from RNA-seq
        Download SpliceMap Comment here

        Comment


        • #5
          Now I am mapping the same thing against the whole genome index, seems it's not a lot more slower than just against a chr20 index ......

          Another thing, if I add option "-p" it failed immediately:
          "
          [zhuz2@cbbdev1 mapping]$ tophat -p -r 100 hg19 mate1.fa mate2.fa
          Traceback (most recent call last):
          File "/usr/local/tophat/1.0.14/bin/tophat", line 1854, in <module>
          sys.exit(main())
          File "/usr/local/tophat/1.0.14/bin/tophat", line 1746, in main
          args = params.parse_options(argv)
          File "/usr/local/tophat/1.0.14/bin/tophat", line 474, in parse_options
          self.system_params.parse_options(opts)
          File "/usr/local/tophat/1.0.14/bin/tophat", line 171, in parse_options
          self.bowtie_threads = int(value)
          ValueError: invalid literal for int() with base 10: '-r'
          "
          The command "tophat -r 100 hg19 mate1.fa mate2.fa" works well.
          Do you know why? Is it the problem of my machine or command?

          Thank you so much for your help,

          Iris

          Comment


          • #6
            you need to use -p followed by the number of processes.

            Originally posted by IrisZhu View Post
            Now I am mapping the same thing against the whole genome index, seems it's not a lot more slower than just against a chr20 index ......

            Another thing, if I add option "-p" it failed immediately:
            "
            [zhuz2@cbbdev1 mapping]$ tophat -p -r 100 hg19 mate1.fa mate2.fa
            Traceback (most recent call last):
            File "/usr/local/tophat/1.0.14/bin/tophat", line 1854, in <module>
            sys.exit(main())
            File "/usr/local/tophat/1.0.14/bin/tophat", line 1746, in main
            args = params.parse_options(argv)
            File "/usr/local/tophat/1.0.14/bin/tophat", line 474, in parse_options
            self.system_params.parse_options(opts)
            File "/usr/local/tophat/1.0.14/bin/tophat", line 171, in parse_options
            self.bowtie_threads = int(value)
            ValueError: invalid literal for int() with base 10: '-r'
            "
            The command "tophat -r 100 hg19 mate1.fa mate2.fa" works well.
            Do you know why? Is it the problem of my machine or command?

            Thank you so much for your help,

            Iris
            SpliceMap: De novo detection of splice junctions from RNA-seq
            Download SpliceMap Comment here

            Comment


            • #7
              Yes I just realized this a few minutes ago :-) so silly of me ....
              Thanks again.

              Comment


              • #8
                Hi,
                Just an observation; is there a reason to map only to chromosome 20? It is possible some reads may map to chromosome 20 but better map to a different chromosome (less mismatches). So some reads may be falsely mapped to Chromosome 20 if the others are not present in the indexes. Also you would not know if reads mapped to more than one chromosome with the same stringency. Possibly something to bare in mind.

                Comment


                • #9
                  Originally posted by poisson200 View Post
                  Hi,
                  Just an observation; is there a reason to map only to chromosome 20? It is possible some reads may map to chromosome 20 but better map to a different chromosome (less mismatches). So some reads may be falsely mapped to Chromosome 20 if the others are not present in the indexes. Also you would not know if reads mapped to more than one chromosome with the same stringency. Possibly something to bare in mind.
                  Thanks for your comment. Of course it doesn't make sense to map to one chromosome.
                  That's just for a test to see if the software is working.

                  Comment


                  • #10
                    In that case, it makes perfect sense.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Recent Advances in Sequencing Analysis Tools
                      by seqadmin


                      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                      05-06-2024, 07:48 AM
                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 05-10-2024, 06:35 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-09-2024, 02:46 PM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-07-2024, 06:57 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-06-2024, 07:17 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X