Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternative to blast?

    Is it just me or does blast seem increasingly to be out of date and a major bottleneck for RNA-seq applications?

    The most popular aligners currently (e.g. bowtie, blast) trade off speed for low memory use, but now memory is cheap. There are very fast, memory-intensive aligners for some problems (e.g. star, mummer) but I don't yet know of one that can replace blast for basic problems such as annotating transcripts against a protein database. This basic operation takes us sometimes two weeks using blastx on a 24-cpu machine, which isn't really sustainable for RNA-seq processing.

    So my question is, does anyone know of a better aligner for this problem, and does anyone else agree that someone *should* create an aligner that is more adapted to current hardware costs?

  • #2
    Have you used blat? http://genome.ucsc.edu/FAQ/FAQblat.html If you are looking for homologous matches this may be an option (but not against a huge db like genpept but if you are going against a proteome it would be fine).

    You should also specify what DB you are using to blastx against for the 2 week (24 cores?) run. Are you using some kind of parallel method for that search or is it a serial job?

    Comment


    • #3
      Blat is definitely better since it keeps the index in memory but it doesn't have the blastx mode (does it??) and it could still be much faster e.g. by using a suffix tree.

      What we usually do is blast say 100k or 200k transcripts against the Uniprot taxonomic subsets and some smaller databases. The bacterial Uniprot is the biggest and the one that takes the longest. Usually we allocate one CPU to each database target, so we could improve on that by also threading the larger targets using blast's own threading.

      But it doesn't change the fact that blast is built on an indexing strategy which economizes memory more than necessary, with consequent reduction in speed. I would not be surprised if 100x speedup is easy to achieve with very practical memory use.

      Comment


      • #4
        Originally posted by Will Nelson View Post
        Blat is definitely better since it keeps the index in memory but it doesn't have the blastx mode (does it??) and it could still be much faster e.g. by using a suffix tree..........
        Have a look below into the blat options ...

        Code:
        options:
           -t=type     Database type.  Type is one of:
                         dna - DNA sequence
                         prot - protein sequence
                         dnax - DNA sequence translated in six frames to protein
                       The default is dna
           -q=type     Query type.  Type is one of:
                         dna - DNA sequence
                         rna - RNA sequence
                         prot - protein sequence
                         dnax - DNA sequence translated in six frames to protein
                         rnax - DNA sequence translated in three frames to protein

        Comment


        • #5
          True, true...I haven't messed with blast for a while.

          But look: this is a 10+ year-old program which has not been updated in forever. It doesn't thread. If you want to use it in blastx mode, then the proteins have to be the *query*, meaning they are streamed and not indexed, which is extremely inefficient for search a large protein DB. Moreover blat uses a seed index rather than the more efficient suffix tree....again trading off time for memory.

          Blat is more scalable than blast, or it would be if the two problems above were addressed, but it certainly is nowhere near the best one can do, either for standalone usage, or much less as the engine of a large-scale cloud annotation service.

          Next-gen sequencing needs a next-gen alignment solution. One of the groups with serious experience at this needs to step up and build something better.

          Comment


          • #6
            LAST is good for homologous sequences.

            Comment


            • #7
              For a blast alternative, how about usearch?

              However, it sounds to me like OP isn't setting up his blasts properly. 200k queries against some subset of uniprot (or even the whole thing) with 24 cores shouldn't take even one day given sufficient RAM..
              Last edited by rhinoceros; 11-22-2013, 12:55 AM.
              savetherhino.org

              Comment


              • #8
                For BLASTX use pauda

                Comment


                • #9
                  Originally posted by Will Nelson View Post
                  True, true...I haven't messed with blast for a while.

                  But look: this is a 10+ year-old program which has not been updated in forever. It doesn't thread.
                  Wrong on both parts. Blast does thread. And improvements to it are on-going. Just because a program was created 10+ years ago does not make it obsolete.

                  Being current and multi-threaded doesn't neccessarily make Blast the best solution however I agree with 'rhinoceros' -- 200K queries vs uniprot using 24 core should not take too long. I routinely annotate large rnaSeq results via Blast. This gives at least a 'first-pass' level of annotation. What I have given up on is Blast2Go; that program is way too slow for a large number of reads.

                  Comment


                  • #10
                    Pauda might be very interesting for you as suggested by jimmybee

                    Otherwise Gblast was just mentioned here
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      Yesterday, 08:07 AM
                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM
                    • seqadmin
                      Understanding Genetic Influence on Infectious Disease
                      by seqadmin




                      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                      09-09-2024, 10:59 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 10-02-2024, 04:51 AM
                    0 responses
                    95 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-01-2024, 07:10 AM
                    0 responses
                    106 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-30-2024, 08:33 AM
                    1 response
                    104 views
                    0 likes
                    Last Post EmiTom
                    by EmiTom
                     
                    Started by seqadmin, 09-26-2024, 12:57 PM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X