Fail to start parpool with workers more than 32

I have been attempting to enable a parallel pool with multiple workers on my local machine. The parallel pool launches normally with 12 workers, but I encounter an error when I try to increase the number to 32. Here are some reference details: Local hardware configuration: CPU(s) = 336, available memory = 694GB MATLAB version: matlab2023b The error message for the 32 workers Parallel pool test is as follows:
Error Report: Failed to initialize the interactive session.
Caused by: Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus The interactive communicating job failed with no message.
Interactive client bound to URL: tcp://tcpnodelay=localhost:27370/protocol/catapult and port 27370 Session failed to start when creating InteractiveClient. Error: Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause Failed to initialize the interactive session.
Has anyone encountered a similar issue? I believe that local hardware resources should not be a limiting factor for utilizing more workers, but why am I unable to use more than 12 workers?

Answers (1)

Hi Yanci,
To address the issue you are experiencing, I kindly suggest performing the following troubleshooting steps in the MATLAB Command Window:
>> restoredefaultpath
>> savepath
After completing these steps, I recommend restarting MATLAB. Once MATLAB has been restarted, attempt to initiate a parallel pool once more.
Should you encounter an error related to the inability to access the file located at C:\Program Files\MATLAB\R2023b\toolbox\local\pathdef.m, it may be necessary to remove the pathdef.m file.
Doing so will allow MATLAB to recreate it upon the next execution of the relevant command.
I trust that the provided solution will assist in resolving your query.
Thanks

4 Comments

I tried to follow the solution you provided, but I couldn't resolve the issue. I attempted to validate this process. From the report, it appears that the failure occurred in the SPMD job test. The report details are as follows: Description: Job errored or did not reach the state 'finished'.
Details: Error Report: Job errored or did not reach the state 'finished'. Command Line Output: Debug Log: LOG FILE OUTPUT:
[27]======BEGIN LICENSE MANAGER ERROR====== [27]Unexpected error during communication with services required to run MATLAB (error 5013). [27]Troubleshoot this issue by visiting: [27]https://www.mathworks.com/support/lme/5013[27]======END LICENSE MANAGER ERROR======
[7]MPID_Init(371).......: PMI_Init returned -1
0~6: 127.0.0.1: -2 ... (remaining exit codes and error messages omitted)
7: 127.0.0.1: 1: process 7 exited without calling finalize ... (remaining exit codes and error messages omitted)
8~26: 127.0.0.1: -2 ... (remaining exit codes and error messages omitted)
27: 127.0.0.1: 1
28~31: 127.0.0.1: -2 ... (remaining exit codes and error messages omitted)
Through multiple tests, I've found that its performance is quite unstable. Sometimes it passes all tests, other times it fails the SPMD job test, and occasionally it fails the Parallel pool test. This leads me to suspect that these issues might be related to my MATLAB installation environment. I'm using Linux CentOS 7.9.2009, and I have installed MATLAB in my personal directory instead of the default one.
It should also be noted that the CPU I am currently using is an AMD EPYC 9634 84-Core Processor.
I was having the same issue in CentOS with a 64-core machine. This is how I fixed it:

Sign in to comment.

Categories

Products

Release

R2023b

Asked:

on 26 Dec 2023

Edited:

on 7 May 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!