HPC Slurm --ntasks and Matlab parcluster NumWorkers question

23 views (last 30 days)
Hi,
I have a question regarding number of tasks (--ntasks ) in Slurm , to execute a .m file containing (‘UseParallel’) to run ONE genetic algorithm (‘ga’).
Maximum physical cpu is 64 per node at HPC.
In Slurm .bash file, this works:
#SBATCH --cpus-per-task=64
#SBATCH --nodes=1
#SBATCH --ntasks=1
But if I want to do
#SBATCH --cpus-per-task=128
#SBATCH --nodes=2
#SBATCH --ntasks=1
Is not allowed. "sbatch: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1"
I simply think to get 64 cpus from 1 node, so 128 cpus from 2 nodes, etc, to run ONE TASK ONLY in the following Matlab .m file.
But this tells me slurm cannot use 2 nodes to run 1 task, and I have to make it "ntasks=2" in the .bash file to still request 64+64 cpus and do some tricks in Matlab .m file and make Matlab buy them as total 128 cpus for 1 task?
In Matlab .m file, I did
num_cpu=64; % I want to increase to 128
parpool(parcluster, num_cpu)
options = optimoptions('ga','UseParallel', true, , 'UseVectorized', false,...
'PopulationSize',num_cpu-1,...)
[x,]=ga(@(x)cost_fun(x), options);
Since multiple nodes in Slurm to do one task is not allowed. I was previously suggested to define a cluster profile in Matlab instead to make HPC accept multiple nodes. https://www.mathworks.com/help/parallel-computing/discover-clusters-and-use-cluster-profiles.html
Is there a way to let NumWorkers to be 128 by using 2 nodes and 1 task in either a Matlab .m or Slurm .batch file ?

Accepted Answer

Raymond Norris
Raymond Norris on 17 Feb 2021
In Slurm, a single task (i.e. MATLAB) can not run across multiple nodes. Let's look at a couple of options.
  • MATLAB on a single node, using 64 cores for running linear algebra routines. In this case, there's only 1 task (MATLAB), but you want to assign 64 cores so that it can spawn threads on those 64 cores.
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
This would allow for 64 computation threads, running on 64 cores.
maxNumCompThreads
ans =
64
  • MATLAB on a single node, using 64 cores to run a local pool. In this case, there's only 1 task (MATLAB), but you want to assign 64 cores so that it can spawn processes on those 64 cores.
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
You could then run the following to run parallel algorthims (parfor, spmd, etc.)
p = parpool('local',64);
In both examples, you're requesting the same of Slurm, but makeing use of the resources slightly different. What you'd like is to start a pool of workers, across two nodes. Therefore, you must spawn a MATLAB job that then spawns a MATLAB Parallel Server job. The "outer" job only requires a single task, it's the "inner" job that will request the 128 cores. For instance
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
Then from MATLAB
num_cpu=128;
parpool(parcluster('slurm'), num_cpu); % Assumes a Slurm profile exists (see below for more info)
options = optimoptions('ga','UseParallel', true, , 'UseVectorized', false,...
'PopulationSize',num_cpu-1,...)
[x,]=ga(@(x)cost_fun(x), options);
Now, the parpool command will spawn an "inner" job, requesting Slurm for 128 cores (across 2+ nodes) to run your parallel pool.

More Answers (0)

Categories

Find more on Cluster Configuration in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!