Unable to achieve desired speed up using parfor

42 views (last 30 days)
Balachandra Suri
Balachandra Suri on 30 Oct 2024 at 19:31
Answered: Matt J on 1 Nov 2024 at 10:58
Hi,
I am initializing several instances of a matlab (p)code using parfor loops on two computers with the following configurations.
Comp A: 16 core 3.4GHz, 8GB per core @ 3200MHz,
Comp B: 32 core 3.6GHz, 8GB per core @ 3200MHz,
I am launching 16 instances on A and 32 on B. I find that all instances on B finish in about half the time as those on A. It baffles me since the spec scale almost identically. Also, all instances do the same thing, hence identical computational overhead. Is there any hardware optimization that should be done for better efficiency on A?
  5 Comments
Balachandra Suri
Balachandra Suri on 31 Oct 2024 at 3:19
Please let me know what other information can be useful? Motherboard config?
Rik
Rik on 1 Nov 2024 at 7:47
My initial guess was that the generation would be different and hence the number of instructions per cycle may be different. That doesn't seem to be the case here.
Perhaps it is the cache? If everything fits in the CPU cache there is no need to go to RAM. I don't have any other plausible cause, unless the smaller chip doesn't actually reach the frequency you mentioned due to thermal and/or power throttling.

Sign in to comment.

Answers (1)

Matt J
Matt J on 1 Nov 2024 at 10:58
I find that all instances on B finish in about half the time as those on A. It baffles me...
That is the expected result, assuming you are running the same loop on both computers. Assuming for example that it is a 32 iteration loop,
parfor i=1:32
...
end
then Comp A would be assigned 2 iterations per core, while Comp B will be assigned only 1. So, it makes perfect sense that Comp B will finish in half the time.

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!