Why parfor is slower than for when accessing 3-D array?

1 view (last 30 days)
I want to make my for loop run faster in a program that i use a 3-D array and by experimenting i came to the conclusion that when I access a 3-D array with parfor is slower than doing it in for, whereas when I run the same code but using a 2-D array the results are the opposite.
A small example is the following.
%Storing data on 3-D array with parfor
tic
parfor (i=1:100)
for (j=2:2134)
for(x=1:1000)
k(i,j,x)=100;
end
end
end
toc
Elapsed time is 319.463914 seconds.
%storing data on 3-D array with for
tic
for (i=1:100)
for (j=2:2134)
for(x=1:1000)
k(i,j,x)=100;
end
end
end
toc
Elapsed time is 25.601181 seconds.
%Storing Data in 2-D array with parfor
tic
parfor (i=1:1000)
for (j=2:2134)
for(x=1:1000)
k(i,j)=100;
end
end
end
toc
Elapsed time is 0.774042 seconds.
%Storing Data in 2-D array with for
tic
for (i=1:1000)
for (j=2:2134)
for(x=1:1000)
k(i,j)=100;
end
end
end
toc
Elapsed time is 3.060670 seconds.
I am running in with 4 workers when in parallel.
Why is this happening when I am using 3-D arrays and how can i solve it?

Accepted Answer

Raymond Norris
Raymond Norris on 9 May 2021
It's tough to say if this is one script or if you've cobled together the outputs. If it's all one script, the 2nd for-loop will benefit from the preallocation done by the parfor. And the 2nd parfor loop will balloon to 15 GB, since k is 100x2134x1000 and then jumps to 1000x2134x1000, which probably isn't what you're really testing. For that reason, I'm going to modify your code a bit so that each loop is not effected by the other and use k1, k2, k3, and k4.
Here are my initial runs (also with 4 workers)
Elapsed time is 388.933997 seconds.
Elapsed time is 38.926940 seconds.
Elapsed time is 1.138446 seconds.
Elapsed time is 2.974387 seconds.
Hands down, the best thing to do first is to preallocate each of the k matrices.
Elapsed time is 13.110676 seconds.
Elapsed time is 3.170457 seconds.
Elapsed time is 1.547085 seconds.
Elapsed time is 1.745380 seconds.
As you notice, 3-D is still running slower. I'm going to re-write the parfor, as such
k1b = zeros(100,2134,1000);
tic
parfor i=1:100
tmp = zeros(2134,1000);
for j=2:2134
for x=1:1000
tmp(j,x)=100;
end
end
k1b(i,:,:) = tmp;
end
toc
These are the times I'm getting (the 6.4s is k1b).
Elapsed time is 12.318290 seconds.
Elapsed time is 6.439760 seconds.
Elapsed time is 2.727125 seconds.
Elapsed time is 1.207010 seconds.
Elapsed time is 1.821987 seconds.
The 2-D is running 1.5x faster with 4 workers, but 3-D is 2.4x slower. I'd attribute this to creating and passing around the tmp matrix. Therefore, I'll switch over from a process pool, to a threaded pool
parpool("threads");
Elapsed time is 10.978899 seconds.
Elapsed time is 3.822644 seconds.
Elapsed time is 3.146364 seconds.
Elapsed time is 0.992767 seconds.
Elapsed time is 1.727735 seconds.
2-D is running 1.7x faster and 3-D is 20% slower.
Maybe others can trim this down further, but truthfully, if all you wanted to do was create a 100 x 2134 x 1000 matrix with the value 100, you could run the following, so I'm sure you have something much more interesting to measure that will will give the workers a lot more work to do :)
tic, k5 = repmat(100,[100 2134 1000]); toc
Elapsed time is 0.211224 seconds.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!