Using parfor eliminates use of multithreading

I am using fmincon to fit a simulation to some experimental data. I am running my program on a Linux machine that has 20 cores. Within my objective function I make 8 calls to a function that runs my Forward Time Centered Space finite difference simulation. This is how I get my set of 8 simulation data points for the current set of design variables within fmincon. My cost function is then just the mean squared error of the experimental and simulation data points.
The function that runs my FTCS simulation is somewhat computationally expensive, so to speed things up I am using parfor in a pool with 8 workers to run all 8 simulations at once. However, using the "htop" command in Linux, I can see that parfor limits each simulation to a single core, with each core running at 100%.
Alternatively, if I just run the simulation outside of fmincon, running six different MATLAB jobs by entering the following into the Linux command line:
(
module load matlab; ulimit -u 8192; nohup matlab -nodisplay -nosplash -r "datestr(now), P = 20, V = 50, run('/folder/PolymerCode/FunctionSim_run_PIdiff.m');" > slot1.txt &
nohup matlab -nodisplay -nosplash -r "datestr(now), P = 20, V = 80, run('/folder/PolymerCode/FunctionSim_run_PIdiff.m');" > slot2.txt &
nohup matlab -nodisplay -nosplash -r "datestr(now), P = 20, V = 100, run('/folder/PolymerCode/FunctionSim_run_PIdiff.m');" > slot3.txt &
nohup matlab -nodisplay -nosplash -r "datestr(now), P = 20, V = 150, run('/folder/PolymerCode/FunctionSim_run_PIdiff.m');" > slot4.txt &
nohup matlab -nodisplay -nosplash -r "datestr(now), P = 20, V = 200, run('/folder/PolymerCode/FunctionSim_run_PIdiff.m');" > slot5.txt &
nohup matlab -nodisplay -nosplash -r "datestr(now), P = 20, V = 250, run('/folder/PolymerCode/FunctionSim_run_PIdiff.m');" > slot6.txt &
)
where FunctionSim_run_PIdiff.m is the same simulation I am calling 8 times within fmincon, I can see that all 20 cores of the machine are being used somewhat uniformly at around 80%.
Is there a way I can get these 8 calls to my simulation function within fmincon to use more multithreading while still using parfor? I have tried manually setting the max number of computational threads using
maxNumCompThreads(20)
but this had no effect.

6 Comments

Matt J
Matt J on 11 Oct 2021
Edited: Matt J on 11 Oct 2021
If you are already at 80% usage without parfor, it doesn't seem like there would be much to gain with parfor. You're hoping for a 20% speed-up? In any case, it might be worth experimenting with fewer workers (e.g. 4 instead of 8).
Matt J,
I am at 80% usage on every core when I am able to run multiple MATLAB processes at the same time from the Linux Command line. However, since I have to run all 8 simulations in the same MATLAB process when using it within my fmincon function, only one simulation will run at a time unless I use parfor.
So, let me try and rephrase. Normally, I use the Linux script above to run multiple MATLAB processes at the same time. When I do this, the mulitple MATLAB processes all use multithreading and end up using 80% of each of the 20 cores. I'd like to replicate this using parfor within a single MATLAB process. However, when I use parfor each of the 8 workers do not use any multithreading and I end up using 100% of only 8 cores.
Matt J
Matt J on 11 Oct 2021
Edited: Matt J on 11 Oct 2021
What about when you use an ordinary for loop, what is the CPU usage? What if you use parfor with 1<n<8 workers?
Other than that, the only thing I can suggest is that maybe you try batch() or parFeval().
When I use an ordinary for loop the multithreading returns and each simulation runs slightly faster, but the overall program ends up running slower since it is running 1 of 8 simulations at a time. So the parfor is helpful in that sense, but it seems there is room for improvement if I can use a parfor and still have multithreading.
I have tried multiple different number of workers. For 1<n<8 workers, number of cores used = n. For n > 8, number of cores used = 8.
And with batch() and parFeval()? Same thing?
parfeval() results in the same thing. My use and knowledge of batch() is limited, but if I am using it correctly it is also resulting in the same thing. It seems like for all of these it boils down to each run of the simulation is allocated to one worker and each worker is using a single thread.

Sign in to comment.

 Accepted Answer

Check the cluster objects' NumThreads property. For instance
local = parcluster('local');
local.NumThreads = <set-number-of-threads-for-each-worker-to-use>;
pool = local.parpool( <set-pool-size> );
Setting NumThreads will automatically set maxNumThreads on the workers.

2 Comments

This did the trick! So simple... Sometimes the documentation for the Parallel Computing toolbox can be so convoluted. Thank you for your help.
Thanks for the feedback, Jason. I've passed your comment onto our Documentation.

Sign in to comment.

More Answers (1)

Matt J
Matt J on 11 Oct 2021
Edited: Matt J on 11 Oct 2021
If you are getting the best performance from the Linux command line, perhaps a solution would be to invoke the Linux command line from within Matlab. You could do that with the system() command.
in combination with parFeval.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!