MATLAB Answers

How to properly store variables to use with parfor?

11 views (last 30 days)
Maverick
Maverick on 23 Jan 2020
Commented: Maverick on 27 Jan 2020
My calculations are based on a binary tree, which takes two previous instances of a block of variables (called Assemblies) and produces another one.
A new assembly is generated based on two assemblies from the upper branch, so all variables must be stored. To this end, I use cell arrays with the following syntax: Assembly_ij = Tree{ithBranch}{jthAssembly}, where Assembly is a 18x3 double Matrix. This approach is allowed by Matlab, however, it doesn't improve the execution of the code at all. I believe this is due to the inappropriate way in which I pass variables to workers. I get the following warning:
The entire array or structure 'Tree' is a broadcast variable. This might result in unnecessary communication overhead.
Most work is done in this part of code, and it should convey the mistake I am making.
initialBranch = initialize();
Tree{1} = initialBranch;
for i = 2 : Nbranches
branch = cell(1, elmsInBranch(i));
parfor j = 1 : elmsInBranch(i)
branch{j} = assembleBlocks(Tree{i-1}{2*j-1}, Tree{i-1}{2*j});
end
Tree{i} = branch;
end
Matlab must surely pass the whole Tree structure to each worker, which is a lot of useless copying. I have no idea how to rewrite it to make it work properly, however, maybe there is some clever way to extract just the needed variables for each worker?

  0 Comments

Sign in to comment.

Answers (1)

Edric Ellis
Edric Ellis on 24 Jan 2020
Yes, your code is somewhat inefficient in that in each parfor loop you're sending the whole of Tree to the workers. Here are a couple of things you can do about that. Firstly, your parfor loop depends only on Tree{i-1}, so you can easily arrange for only that piece to be sent to the workers:
%% Just some arbitrary setup code
Nbranches = 7;
Tree = cell(1, Nbranches);
Tree{1} = arrayfun(@rand, repmat(4, 1, 10), 'UniformOutput', false);
%% Option 1: Just pull out "Tree{i-1}"
for i = 2:Nbranches
% Pull out the 'Tree' cell that the parfor loop needs
treeIMinusOne = Tree{i-1};
elmsInBranch = floor(numel(treeIMinusOne)/2);
branch = cell(1, elmsInBranch);
parfor j = 1:elmsInBranch
branch{j} = treeIMinusOne{2*j-1} + treeIMinusOne{2*j};
end
Tree{i} = branch;
end
(I've adapted your code somewhat into something that is actually executable).
In this case, you can go further - treeIMinusOne can be converted into a pair of sliceable arrays by observing the way the indexing proceeds. So, here's a refinement:
%% Option 2: Attempt to slice "Tree{i-1}" too
Tree2 = [Tree(1), cell(1, Nbranches-1)];
for i = 2:Nbranches
% Pull out 'Tree' cell as before
treeIMinusOne = Tree2{i-1};
% Make two sliceable versions
sliceOne = treeIMinusOne(1:2:end);
sliceTwo = treeIMinusOne(2:2:end);
elmsInBranch = floor(numel(treeIMinusOne)/2);
branch = cell(1, elmsInBranch);
parfor j = 1:elmsInBranch
% Uses only sliced access to the elements of 'Tree2'.
branch{j} = sliceOne{j} + sliceTwo{j};
end
Tree2{i} = branch;
end
Now, as to whether any of this gains you any real performance benefit - it's far from clear. For this to actually gain you performance, then your parfor loop would have to have previously been dominated by data transfer costs.

  1 Comment

Maverick
Maverick on 27 Jan 2020
Your remarks were very helpful and I was able to get a 3x increase in speed in my test case, however, this didn't fix the problem :( The sequential program evaluates in 0.9 seconds, whereas for my initial approach it took 9 whole seconds. Now, with your fixes, it's about 3 seconds.
It looks like Matlab still performs too much copying. The strange thing is that I had implemented the same code in c++ openMP and I got a 2.5x boost in performance.

Sign in to comment.

Sign in to answer this question.