Is there a way to speed up matrix calculations by Parallel Computing Toolbox?
Show older comments
For example, I have the following matrix calculation:
for j = 1: 1000000
O(i,i,j) = M(i,i).* N(i,i).* J(i,i);
end
where i is, say, 200. To speed up the calculation, can I take the advantage of Parallel Computing Toolbox? I have three macbooks, can I connect them together to execute the above codes together in an parallel way? Many thanks
Accepted Answer
More Answers (2)
It is wasteful to repeatedly compute the same fixed M(i,i).* N(i,i).* J(i,i) for every j. Also, matrix operations are already coded to take advantage of your system's multi-threading capabilities. I would just skip the for-loop and do
O(i,i,1:1000000)=M(i,i).* N(i,i).* J(i,i);
Walter Roberson
on 3 Oct 2013
for j = 1: 1000000
O(i,i,j) = M(i,i).* N(i,i).* J(i,i);
end
can be rewritten as
MNO = M(i,i) .* N(i,i) .* J(i,i);
for j = 1 : 1000000
O(i,i,j) = MNO;
end
which in turn can be rewritten as
joff = sub2ind( size(O), [i, i, 1] );
jskip = size(O,2) .* size(O,3);
O(joff : jskip : end) = M(i,i) .* N(i,i) .* J(i,i);
It probably is not worthwhile to convert this for use with the parallel toolbox, but if you did then you would calculate M(i,i) .* N(i,i) .* J(i,i) ahead of time, and then use a parfor or distributed array to set every jskip'th element of the O array to be the precalculated value.
In order to hook multiple computers up for use in the parallel computing toolbox, you would also need to use the distributed computing toolbox, as by itself the parallel computing can only be local.
If the above is not your pattern, then we will need to see your real pattern to advise you as to whether the parallel computing will help.
5 Comments
Kyle
on 3 Oct 2013
Walter Roberson
on 3 Oct 2013
Your U_grid and V_grid are both incrementally updated based upon the previous iteration's result, and thus upon the history of the calculation right back to the initialization. Such calculations cannot be parallelized except in-so-far as you can parallelize the execution of the individual statement,
U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + ...
Dx*convolve2(U_grid, Laplacian, 'wrap'));
and then likewise for V_grid. (You cannot update V_grid in parallel with updating U_grid because the V_grid calculation depends upon the updated U_grid.)
Your U_grid and V_grid are only 60 x 60, so they are too small to benefit from multithreading or multiple processors for those two individual statements.
What would be possible (feasible) would be to run multiple independent simulations, each with a different starting random number seed. If the simulation is plausibly going to get trapped in a local minimum that could be worthwhile, but if the simulation is expected to robustly find a (single) global minimum no matter the starting point, then there would probably be no value in doing that.
Matt J
on 3 Oct 2013
Your U_grid and V_grid are both incrementally updated based upon the previous iteration's result, and thus upon the history of the calculation right back to the initialization. Such calculations cannot be parallelized except in-so-far as you can parallelize the execution of the individual statement,
Not true, actually. See the documentation on parfor reduction variables
Walter Roberson
on 3 Oct 2013
That is applicable if the order of the partial factors does not make any difference -- if the reduction operation is commutative. It is not immediately clear that it would be the case for the above code.
A 1D convolution can be written as a matrix multiplication involving a topelitz matrix, and I have read that a 2D convolution can be done as two 1D convolutions. That would seem to suggest that a 2D convolution could be done as a single matrix multiplication by the product of the two topelitz matrices, but I have not yet found anything that says that explicitly, and I have not found anything that talks about the numeric stability if that were the case.
If X and Y represent the initial U_grid and V_grid respectively, it appears that each iteration goes something like
X = X + dt * (a - (b+1) * X + X^2 * Y + X*Conv)
Y = Y - (X^2 * Y + (Conv - b - 1) * X + a)^2 * Y * dt^3 - (2 * (X^2 * Y + (Conv - b - 1) * X + a)) * (-(1/2) * b + X * Y) * dt^2 + (X * b - X^2 * Y + Dy * Conv) * dt
where Conv is the hypothetical multiplication of the two topelitz matrices as needed to represent the 2D convolution.
It does not at all appear to me that the reduction over addition would involve terms that are independent of the order of calculations, so I doubt the particular code here can be done as a parallel reduction.
Kyle
on 6 Oct 2013
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!