Unable to achieve desired speed up using parfor
42 views (last 30 days)
Show older comments
Hi,
I am initializing several instances of a matlab (p)code using parfor loops on two computers with the following configurations.
Comp A: 16 core 3.4GHz, 8GB per core @ 3200MHz,
Comp B: 32 core 3.6GHz, 8GB per core @ 3200MHz,
I am launching 16 instances on A and 32 on B. I find that all instances on B finish in about half the time as those on A. It baffles me since the spec scale almost identically. Also, all instances do the same thing, hence identical computational overhead. Is there any hardware optimization that should be done for better efficiency on A?
5 Comments
Rik
on 1 Nov 2024 at 7:47
My initial guess was that the generation would be different and hence the number of instructions per cycle may be different. That doesn't seem to be the case here.
Perhaps it is the cache? If everything fits in the CPU cache there is no need to go to RAM. I don't have any other plausible cause, unless the smaller chip doesn't actually reach the frequency you mentioned due to thermal and/or power throttling.
Answers (1)
Matt J
on 1 Nov 2024 at 10:58
I find that all instances on B finish in about half the time as those on A. It baffles me...
That is the expected result, assuming you are running the same loop on both computers. Assuming for example that it is a 32 iteration loop,
parfor i=1:32
...
end
then Comp A would be assigned 2 iterations per core, while Comp B will be assigned only 1. So, it makes perfect sense that Comp B will finish in half the time.
0 Comments
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!