Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • need some help with Tophat

    Hello All,

    I am new to this site and also new to the field of next generation seq.
    I am trying to use Tophat (version 1.0.14) to do RNA-seq mapping. And I found it pretty slow: It took ~15 hours to finish the mapping of ~17 million pairs of reads(75bp) against a chr20 index. My reads are in fasta format and the command I used is:
    "tophat -r 250 chr20 **.lane1_1.fa **.lane1_2.fa"

    I think I installed Tophat and Bowtie properly since they passed the test. But I must have done something silly. I can't imagine how long it's gonna take if I map 400 million reads against the whole human genome.

    Could any one tell what I could do wrong?

    Thank you so much,

    Iris
    Last edited by IrisZhu; 07-29-2010, 01:34 PM.

  • #2
    That is not slow. It sounds about the right time.

    Also, yes 400million reads will need a powerful multi-core computer with a lot of memory.

    For example, it took me about 17 hours to map 130 million (100bp) reads with SpliceMap on 10 cores.

    The current version of SpliceMap is also about the same speed as TopHat. However, the next version will be twice as fast.

    Edit: use the -p option in TopHat if you have a multi-core machine.
    Last edited by john_mu; 07-29-2010, 01:36 PM.
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      John, thanks for your reply.

      This 15 hours for mapping 17 million pairs against a single (short) chromosome ( chr20 ) is reasonable? I am asking just to make sure you was not thinking of the whole genome :-)

      Comment


      • #4
        Originally posted by IrisZhu View Post
        John, thanks for your reply.

        This 15 hours for mapping 17 million pairs against a single (short) chromosome ( chr20 ) is reasonable? I am asking just to make sure you was not thinking of the whole genome :-)
        oh... a single chromosome??? That is uhhh... a bit slow.

        How did you build the index?

        EDIT: also how much free memory do you have? Maybe either bowtie or tophat is thrashing.
        Last edited by john_mu; 07-29-2010, 07:57 PM.
        SpliceMap: De novo detection of splice junctions from RNA-seq
        Download SpliceMap Comment here

        Comment


        • #5
          Now I am mapping the same thing against the whole genome index, seems it's not a lot more slower than just against a chr20 index ......

          Another thing, if I add option "-p" it failed immediately:
          "
          [zhuz2@cbbdev1 mapping]$ tophat -p -r 100 hg19 mate1.fa mate2.fa
          Traceback (most recent call last):
          File "/usr/local/tophat/1.0.14/bin/tophat", line 1854, in <module>
          sys.exit(main())
          File "/usr/local/tophat/1.0.14/bin/tophat", line 1746, in main
          args = params.parse_options(argv)
          File "/usr/local/tophat/1.0.14/bin/tophat", line 474, in parse_options
          self.system_params.parse_options(opts)
          File "/usr/local/tophat/1.0.14/bin/tophat", line 171, in parse_options
          self.bowtie_threads = int(value)
          ValueError: invalid literal for int() with base 10: '-r'
          "
          The command "tophat -r 100 hg19 mate1.fa mate2.fa" works well.
          Do you know why? Is it the problem of my machine or command?

          Thank you so much for your help,

          Iris

          Comment


          • #6
            you need to use -p followed by the number of processes.

            Originally posted by IrisZhu View Post
            Now I am mapping the same thing against the whole genome index, seems it's not a lot more slower than just against a chr20 index ......

            Another thing, if I add option "-p" it failed immediately:
            "
            [zhuz2@cbbdev1 mapping]$ tophat -p -r 100 hg19 mate1.fa mate2.fa
            Traceback (most recent call last):
            File "/usr/local/tophat/1.0.14/bin/tophat", line 1854, in <module>
            sys.exit(main())
            File "/usr/local/tophat/1.0.14/bin/tophat", line 1746, in main
            args = params.parse_options(argv)
            File "/usr/local/tophat/1.0.14/bin/tophat", line 474, in parse_options
            self.system_params.parse_options(opts)
            File "/usr/local/tophat/1.0.14/bin/tophat", line 171, in parse_options
            self.bowtie_threads = int(value)
            ValueError: invalid literal for int() with base 10: '-r'
            "
            The command "tophat -r 100 hg19 mate1.fa mate2.fa" works well.
            Do you know why? Is it the problem of my machine or command?

            Thank you so much for your help,

            Iris
            SpliceMap: De novo detection of splice junctions from RNA-seq
            Download SpliceMap Comment here

            Comment


            • #7
              Yes I just realized this a few minutes ago :-) so silly of me ....
              Thanks again.

              Comment


              • #8
                Hi,
                Just an observation; is there a reason to map only to chromosome 20? It is possible some reads may map to chromosome 20 but better map to a different chromosome (less mismatches). So some reads may be falsely mapped to Chromosome 20 if the others are not present in the indexes. Also you would not know if reads mapped to more than one chromosome with the same stringency. Possibly something to bare in mind.

                Comment


                • #9
                  Originally posted by poisson200 View Post
                  Hi,
                  Just an observation; is there a reason to map only to chromosome 20? It is possible some reads may map to chromosome 20 but better map to a different chromosome (less mismatches). So some reads may be falsely mapped to Chromosome 20 if the others are not present in the indexes. Also you would not know if reads mapped to more than one chromosome with the same stringency. Possibly something to bare in mind.
                  Thanks for your comment. Of course it doesn't make sense to map to one chromosome.
                  That's just for a test to see if the software is working.

                  Comment


                  • #10
                    In that case, it makes perfect sense.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Best Practices for Single-Cell Sequencing Analysis
                      by seqadmin



                      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                      Today, 07:15 AM
                    • seqadmin
                      Latest Developments in Precision Medicine
                      by seqadmin



                      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                      Somatic Genomics
                      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                      05-24-2024, 01:16 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:18 AM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Today, 08:04 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-03-2024, 06:55 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-30-2024, 03:16 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X