Strange core usage when running Slurm jobs
Show older comments
I'm trying to run jobs on an HPC cluster using Slurm, but I run into problems both when I'm running interactive jobs and when I'm submitting batch jobs.
- When I run interactive jobs and I book one node, then I manage to use all of the node's 20 cores. But when I book more than one node for an interactive job, then the cores on the extra nodes are just left unused.
- When I run a batch job, then the job uses only one core per node.
Do you have any idea what I might be doing wrong?
1. I book my interactive job from the command prompt using the following commands:
interactive -A myAccountName -p devel -n 40 -t 0:30:00
module load matlab/R2023a
matlab
to submit a 30-minute 40-core job to the "devel" partition using my account (not actually called "myAccountName"), load the Matlab module and launch Matlab as an X application. Once in Matlab, I first choose the "Processes" parallel profile and second run the "Setup" and "Interactive" sections in the silly little script at the bottom of this question. In two separate terminal sessions, I then use
ssh MYNODEID
htop
where MYNODEID is either of the two nodes assigned to the interactive job. Then I see that the job uses all of the cores on one of the nodes and none of the cores on the second node.
2. To book my batch job, I load and launch Matlab from the command prompt using the following commands
module load matlab/R2023a
matlab
and then run the "Setup" and "Batch" sections in the silly little script at the bottom of this question. Using the same procedure as above, htop lets me see that the job uses two cores (one on each node) and leaves the remaining 38 cores (19 on each node) unused.
Silly little script
%% Setup
clear;
close all;
clc;
N = 1000; % Length of fmincon vector
%% Interactive
x = solveMe(randn(1, N));
%% Batch
Cluster = parcluster('rackham R2023a');
Cluster.AdditionalProperties.AccountName = 'myAccountName';
Cluster.AdditionalProperties.QueueName = 'devel';
Cluster.AdditionalProperties.WallTime = '0:30:00';
Cluster.batch( ...
@solveMe, ...
0, ...
{}, ...
'pool', 39 ...
); % Submit a 30-minute 40-core job to the "devel" partition using my account (not actually called "myAccountName")
%% Helper functions
function A = slowDown()
A = randn(5e3);
A = A + randn(5e3);
end
function x = solveMe(x0)
opts = optimoptions( ...
"fmincon", ...
"MaxFunctionEvaluations", 1e6, ...
"UseParallel", true ...
);
x = fmincon( ...
@(x) 0, ...
x0, ...
[], [], ...
[], [], ...
[], [], ...
@(x) nonlinearConstraints(x), ...
opts ...
);
function [c, ceq] = nonlinearConstraints(x)
c = [];
A = slowDown();
ceq = 1 ./ (1:numel(x)) - cumsum(x);
end
end
Accepted Answer
More Answers (0)
Categories
Find more on Third-Party Cluster Configuration in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!