parfor does not uns parallel pool when a parfeval process is still running

Dear all,
in my code I use parallel pool on a local machine and execute a longer function with 'parfeval'. In the main code I shortly later want to use 'parfor', but checking CPU work load, 'parfor' does not use any worker since 'parfeval' still uses one worker when executing 'parfor'.
Is this behaviour expected?
How to check, if there are still running processes at the parallel pool?
How to wait until all processes at the parallel pool are finished?
Thanks in advance
I am using Win10 64bit, Matlab R2020a, Parallel Computing Toolbox 7.2.

2 Comments

@Matt Thanks for immediat reply.
But how to find the future object F? So far I execute parfeval in a subfunction and do not handle return values from parfeval.

Sign in to comment.

 Accepted Answer

Yes, this behaviour is expected - the different parallel language features parfor, parfeval, and spmd do not share the pool. You can wait for all parfeval requests like so:
p = gcp('nocreate');
if ~isempty(p)
q = p.FevalQueue;
% First wait for any queued futures
wait(q.QueuedFutures);
% Then wait for any running futures
wait(q.RunningFutures);
end
(The calls to wait need to be in that order - in the other order, any queued futures will start running after the first wait).

8 Comments

Dear Edric, thanks for problem solving answer.
Following up: can I define two separate pools in parallel, one (small) specific for parfevel and another (big) one for parfor?
Background: I use parfeval to generate figures and movies of processed data in background (see link), which is not time critical, but by using parfevel does not delay further data processing. Data processing with parfor however is time critical and should not be delayed by boring long-lasting video saving.
Any suggestion?
Unfortunately, today you cannot have two parallel pools open simultaneously. You could consider using batch instead of parfeval - that has roughly the same API - but it launches separate worker processes, rather than using workers from the parallel pool. (You would need to make sure that your parallel pool doesn't use all the workers otherwise the batch jobs will never start).
Excelent!
This also solves my memory issue I asked here (see link).
If I understand it right, (assuming I could have 8 workers on my local machine) I need first to define a cluster then start the parallel pool.
c = parcluster('local2'); % configured with 2 workers
p = parpool('local', 4);
then I can run the job
job = batch(c,'batchtest');
Contratry, if I first start the parpool, then the parcluster, batch says:
"Warning: This job will remain queued until the Parallel Pool is closed."
unless I could have more than 4 workers.
Is this correct?
You can't have more workers running either batch or parpool than the cluster you're using allows. Note that in the case of the "local" cluster, even if you create multiple profiles, there is still only a single underlying cluster instance behind the scenes. This means that you can't set up two different "local" cluster profiles with different values of NumWorkers and have them operate independently.
Dear Edric,
unfortunately, I have dificulties understanding your last reply.
"You can't have more workers running either batch or parpool than the cluster you're using allows."
Agree. in the example above 8 workers are allowed. I thought I created 2 workers with
c = parcluster('local2');
which did not pop up in the ProcessExplorer. And then I created 4 additional workers with
p = parpool('local', 4);
Those I saw in the ProcessExplorer. When executing 'batchtest.m' on the cluster c with
job = batch(c,'batchtest');
I saw an additional fifth worker poping up in the ProcessExplorer, which processed the batch. When then executing a parfor loop, 5 workers are busy.
parfor n=1:100
d = eig(rand(1000,1000));
end
When executing batch again, I get a sixth worker, and so on, until I get the max. number of workers allowed, which is 8 in this case. So the profile 'local2' configured with 2 workers was not reflected.
Contratry, when I skip the "p = parpool('local', 4);" (line 2), the batch command starts the (default) parallel pool with 4 workers, from which one worker executes the batch command. When I then execute a parfor loop, it uses only the remaining 3 workers. So in total 4 workers are busy.
c = parcluster('local2'); fprintf('Cluster created\n')
%p = parpool('local', 4); fprintf('Pool created\n')
job1 = batch(c,'batchtest'); fprintf('Batch1 started\n')
%job2 = batch(c,'batchtest'); fprintf('Batch2 started\n')
% parfor test
tic
parfor n=1:100
d = eig(rand(1000,1000));
end
toc
fprintf('DONE\n')
How does this observed behaviour fits to your next statement "Note that in the case of the "local" cluster, ..."?
Doesn't it only describe the latter behaviour? But not the first?
The statement parcluster('local2') creates a cluster object, but does not launch any worker processes. Worker processes are launched only when you submit work to the cluster using batch or parpool.
The slightly confusing situation I was trying to explain is that even when you have two cluster profiles for the local cluster, there is still only a single underlying local cluster mechanism shared by all cluster objects. That means that if you have two "local" cluster profiles called local and local2, they both share the same pool of workers.
This means that any time you modify the NumWorkers property of any local cluster instance, all others get the same value. (Creating a cluster instance using parcluster where the profile has a specific value for NumWorkers also counts as "modifying NumWorkers").
So, I constructed two cluster profiles on my machine. local2 specifies NumWorkers = 2, and local specifies NumWorkers = 6. You can see the effect of this like so:
>> l2 = parcluster('local2'); l2.NumWorkers
ans =
2
>> l = parcluster('local'); l.NumWorkers
ans =
6
% Because 'local' specifies NumWorkers = 6, l2 reflects this value!
>> l2.NumWorkers
ans =
6
Interesting, but look at this:
l2 = parcluster('local2'); l2.NumWorkers
ans =
2
p = parpool('local', 6); p.NumWorkers
>> Starting parallel pool (parpool) using the 'local' profile ...
>> Connected to the parallel pool (number of workers: 6).
ans =
6
>> l2.NumWorkers
ans =
8
so 'parpool' is a subset of 'parcluster'? But when creating it extends 'parcluster'.
parpool consumes workers from the appropriate cluster. When you say parpool('local', 6), this effectively does the following:
clus = parcluster('local');
parpool(clus, 6);
So, in your case, the 'local' cluster defines NumWorkers to be 8. Your parpool explicitly asks for only 6 of them. As I showed above, all local clusters are linked to a single mechanism behind the scenes. Therefore, (and I admit somewhat surprisingly), calling parpool('local',6) has the side-effect of changing the value of l2.NumWorkers to whatever NumWorkers is specified in your 'local' cluster.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!