MATLAB Answers

Calling parpool with SpmdEnabled = False

13 views (last 30 days)
Jose Sanchez
Jose Sanchez on 29 Jan 2020
Answered: Edric Ellis on 30 Jan 2020
Hello,
I am having an issue with my University cluster. It is that when one worker crash, for a reason like "communication with the worker is lost probably due to a network problem", then all the remaining workers crash, and it may happen again and again!
Because I am using parfor and it doesn't require communication among the workers like spmd, then I would like all the other workers to finish their job!
I learnt online that you can solve this issue by calling your HPC cluster with the parameter "SpmdEnabled" equals False. What I did but MATLAB ignored my request as you can see in the Warning message at the end.
My question is, how can I solve this issue?
---------------------------------------------------------------------------------------------------------
parpool('HPCServerProfile1', 160, 'SpmdEnabled', false)
Starting parallel pool (parpool) using the 'HPCServerProfile1' profile ...
Warning: Disabling SPMD on parallel pools is not supported on this cluster type. Set 'SpmdEnabled' value to true.
Connected to the parallel pool (number of workers: 160).

  0 Comments

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 30 Jan 2020
Unfortunately, only MJS and Local cluster types support SpmdEnabled = false. You might be able to use the "cluster parfor" approach though - see the documentation. Basically, you would transform your main parfor loop like so:
% Important: do *not* create a parallel pool prior to running this!
% In fact, you may wish to call "delete(gcp('nocreate'))" to be sure.
% Get the cluster object
c = parcluster('HPCServerProfile1');
% Create the parforOptions structure
opts = parforOptions(c);
% Run the parfor loop directly on the cluster with no parallel pool
parfor (idx = 1:10000, opts)
% Body of the parfor loop
end
This should be more robust, however it does incur more overhead than the interactive parpool approach.

  0 Comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.