I have two clusters. One has 8 machines, 16 CPUs each, 128 GB memory each, all connected to a fast disk. However I can only run the command-line Bioscope on it. With that much machine power I do not worry about running out of memory.
My other cluster also has 8 machines. 4 with 4 CPUs and 8 GB memory each and the other 4 with 8 CPUs and 32 GB memory each. I have been trying to run WT-bioscope on these machines but with less success. I am running out of memory plus sometimes getting kernel warnings. My current parameters are:
mapping.np.per.node=4
mapping.number.of.nodes=10
mapping.memory.size=3
In other words 4 CPUs per node and 10 nodes (I am making my 8-cpu machines into 2 nodes each so, in theory, I should have 12 nodes but I wanted to leave some processing power free).
The memory parameter is 3 GB but I am unsure what this really means. Does bioscope start up 4-cpu jobs on a node using only 3 GB? Or does bioscope start up 4 1-cpu jobs on a node using 3 times 4 GB of memory? It appears to do the latter since my 8 GB machines have to use virtual memory at times.
I really hesitate to go below 3 GB since my genome reference size is ~2 GBases. As far as I can tell Bioscope is chopping up the matching portion of its pipeline into many small chunks in order to accommodate this small memory allocation.
Anyway I would say that the more memory you have then the better off you are. It makes sense to run fewer jobs with lots of memory than many jobs each starved for memory.
Once I get Bioscope running on my small cluster using all 8 machines then I will try it out on the small cluster using just the 4 large memory machines. Our small cluster is sort of a 'recycled' cluster (e.g. some of the machines were given to us) and we would like to use it if possible. I hate to think that a 4-cpu, 8-GB machine is just so much junk and thus we should re-gift it but, for Bioscope at least, it appears that those machines may indeed be worthless.
My other cluster also has 8 machines. 4 with 4 CPUs and 8 GB memory each and the other 4 with 8 CPUs and 32 GB memory each. I have been trying to run WT-bioscope on these machines but with less success. I am running out of memory plus sometimes getting kernel warnings. My current parameters are:
mapping.np.per.node=4
mapping.number.of.nodes=10
mapping.memory.size=3
In other words 4 CPUs per node and 10 nodes (I am making my 8-cpu machines into 2 nodes each so, in theory, I should have 12 nodes but I wanted to leave some processing power free).
The memory parameter is 3 GB but I am unsure what this really means. Does bioscope start up 4-cpu jobs on a node using only 3 GB? Or does bioscope start up 4 1-cpu jobs on a node using 3 times 4 GB of memory? It appears to do the latter since my 8 GB machines have to use virtual memory at times.
I really hesitate to go below 3 GB since my genome reference size is ~2 GBases. As far as I can tell Bioscope is chopping up the matching portion of its pipeline into many small chunks in order to accommodate this small memory allocation.
Anyway I would say that the more memory you have then the better off you are. It makes sense to run fewer jobs with lots of memory than many jobs each starved for memory.
Once I get Bioscope running on my small cluster using all 8 machines then I will try it out on the small cluster using just the 4 large memory machines. Our small cluster is sort of a 'recycled' cluster (e.g. some of the machines were given to us) and we would like to use it if possible. I hate to think that a 4-cpu, 8-GB machine is just so much junk and thus we should re-gift it but, for Bioscope at least, it appears that those machines may indeed be worthless.
Comment