Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vinaydu
    Member
    • Dec 2010
    • 17

    Blast Program Memory Usage

    Dear Members,

    I have compiled and installed blastall and using the same in cent os ver 6.0.

    When I am running this, it consumes very less amount of physical memory and blast is also taking very long.

    free -g command showed that I have 725 Gbs of ram is installed

    ps aux| grep "blastall" gives me the the process id using which I enquired that how much memory blastall is using.

    top psID

    this gives me that blastall is using only 11gb of ram.

    I am using -a option to define the number of processors which is 48.

    What could be reason that blastall is not utilizing the available memory?
  • vinaydu
    Member
    • Dec 2010
    • 17

    #2
    Please reply members. My blast job is taking very long. Please help..

    Comment

    • amitbik
      Member
      • May 2013
      • 53

      #3
      Try to run your blast with mpirun. In some cases it workedout for me.

      mpirun [option] command

      Comment

      • vinaydu
        Member
        • Dec 2010
        • 17

        #4
        Thanks amit. Let me try it. I will post the result...

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by vinaydu View Post
          Please reply members. My blast job is taking very long. Please help..
          Please provide some additional information about the size of the query/database you are blasting against. What version of blast are you using? Are you running it on a cluster or a really large server (700+ G memory free seems suspect, since single servers with that much memory are not common).

          Comment

          • vinaydu
            Member
            • Dec 2010
            • 17

            #6
            @Genomax,

            I am BLASTing against nr database which presently is ~15Gb.

            I am using blastall.

            I am running it on a large server, which has this much (750Gb) physical memory.

            Please advise...

            Comment

            • sphil
              Senior Member
              • Apr 2010
              • 192

              #7
              how many sequences are in your input set?

              Comment

              • vinaydu
                Member
                • Dec 2010
                • 17

                #8
                Hi sphil,

                I have 9706 sequences in a file.

                grep ">" fastafile | wc -l

                Actually the intial fasta file was very big and hence I need to split it in smaller sizes.
                Now even after spliting BLAST is reluctantly slow.

                Further for some of the fasta files it terminates (bash message) hence i thought I should split it. But even after spliting for some files it is terminate and BAST is very slow. I am forcing xml output.

                Any suggestions!!!!

                Comment

                • sphil
                  Senior Member
                  • Apr 2010
                  • 192

                  #9
                  Can you be more specific on "it is slow"? Does it take days, weeks or just a few hours?

                  Comment

                  • vinaydu
                    Member
                    • Dec 2010
                    • 17

                    #10
                    Hi Sphil,

                    Yeah it takes 2-3 days to BLAST a fasta file containing 10K sequences and generating xml output. I have been blasting these files for last 10-12 days.. Still not complete.

                    Further my apprehension is that when the memory is available, why BLAST is not using it.

                    Comment

                    • sphil
                      Senior Member
                      • Apr 2010
                      • 192

                      #11
                      What comes into my mind is that every of your 10k sequences might end up aligning to a huge amount of sequences in the NR database. Therefore the output file is also very large (maybe compare...). What that means, imagine if every sequence hits in between 50 to 100 times. You end up blasting like 50k to 1000k sequences which really takes quite a bit amount of time. Try to set the "report max hits" flags...

                      Comment

                      • vinaydu
                        Member
                        • Dec 2010
                        • 17

                        #12
                        Thanks sphil for showing the interest.

                        It is true that the output files generated are of the size of ~3.5 Gbs.

                        But I am more concerned about the physical memory utilization. BLAST is running fine. Only thing is that this process is not utilizing all the memory.

                        Comment

                        • sphil
                          Senior Member
                          • Apr 2010
                          • 192

                          #13
                          Why should it use all memory? It can only handle as many reads at the same time as you have CPU-kernels on hand. This should then somehow correlate with the memory used....

                          Comment

                          • vinaydu
                            Member
                            • Dec 2010
                            • 17

                            #14
                            I am using 48 cpus using -a option for blast all. So you are indicating that this is fine?

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              #15
                              Any well written program will use memory as needed. Having extra memory does not help per se (unless you start using a part of the RAM as a virtual drive to hold data, see below).

                              You never made it clear if this is a cluster or a single server. If you really have 768 GB of RAM on a single server and you are the only user actively using this system you could think about creating a virtual disk (if you do not have admin access you will need to ask the admins to see if they would be willing) and copy your db index into that space for the fastest possible access.

                              If you are using "blastall" that seems to indicate that you are not using a newer version of blast suite. Is that the case? Depending on which blast search (n/p) you are running you should optimize search parameters according to what it is you are trying to get from this search. Blast command line manual is a must read reference: http://www.ncbi.nlm.nih.gov/books/NBK1763/

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 11:10 AM
                              0 responses
                              5 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              41 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              102 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              123 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...