Is there a way to speed up matrix calculations by Parallel Computing Toolbox?

Question

0 votes

For example, I have the following matrix calculation:

for j = 1: 1000000
    O(i,i,j) =  M(i,i).* N(i,i).* J(i,i);
end

where i is, say, 200. To speed up the calculation, can I take the advantage of Parallel Computing Toolbox? I have three macbooks, can I connect them together to execute the above codes together in an parallel way? Many thanks

2 Comments
Show None Hide None

Kyle on 3 Oct 2013

Open in MATLAB Online

Thank you for your suggestions, Matt J and Walter Roberson. I think I need to detail my question. I was actually going to convert my codes to a parallel version running on multiple computers so that the the efficiency could be dramatically enhanced.

My codes are about solving a pair of differential equations (the Brusselator model), in other words, 2D gird-simulation in high resolution.

    % Simulate 2D Brusselator model by Euler algorithm
    clear;clc;close all
    a = 5;
    b = 19;
    Dx = 5;
    Dy = 40;
    time_end = 150;    % length of simulation, in sec
    %%dimensions for the full-resolution grid
    [Nx Ny] = deal(60);  % no. of sampling points along each axis
    [Lx Ly] = deal(60);        % square cortex (cm)
    [dx dy] = deal(Lx/Nx, Ly/Ny);  % spatial resolution (cm)
    %%time resolution and time-base
    dt = 1*1e-3;      % Nx = Ny =  60; D2 = 1.0 cm^2   
    Nsteps = round(time_end/dt)
    %%3x3 Laplacian matrix (used in grid convolution calculations)
    Laplacian = [0 1 0; 1 -4 1; 0 1 0];
    %%set up storage vectors and grids
    [U_grid  V_grid]               = deal(0.001*randn(Nx, Ny));
    % save UV_inis U_grid  V_grid
    U0 = a; V0 = b/a;
    % initialize the grids at steady-state values
    [U_grid  V_grid]             = deal(U0 + U_grid, V0 + V_grid);
    % diffusion multipliers (depend on step size)
    Dx = Dx/dx^2;
    Dy = Dy/dx^2;
    %%Simulation
    stride2 = 100;          % iterations per screen update
    time = [0:Nsteps-1]'*dt;  % timebase
    ii = 1;
    for i = 1: Nsteps
        i
        U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + ...
            Dx*convolve2(U_grid, Laplacian, 'wrap')); 
        V_grid = V_grid + dt*(b*U_grid - U_grid.^2.*V_grid + ...
            Dy*convolve2(V_grid, Laplacian, 'wrap'));
        if (mod(i, stride2) == 1 || i == Nsteps)        
            mesh(x, y, U_grid);
            drawnow;
            U_save(:,:, ii) = U_grid;
            ii = ii + 1;
        end
    end

In the codes, "U_save" is the final results I want to have. If the "Nx" and "Ny" are set to 300, meaning high resolution, the computing speed will be very slow.

I have Parallel Computing Toolbox and Distributed Computing Toolbox. May I have your suggestions for an efficient calculation? Many thanks

Kyle on 4 Oct 2013

As all experts suggested, parallel computation may not be much helpful for my case. In the other hand, I'm thinking GPU-assisted calculation. Since my computer doesn't have a sound GPU, I need your suggestions that if it worths to get a modern graphic card to boost my simulation? Can GPU help to enhance my grid simulation? Many thanks

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Teja Muppirala on 4 Oct 2013

Open in MATLAB Online

0 votes

As others have pointed out, this will be difficult to parallelize for a single run, since you are integrating forward in time, and each successive time step depends on the previous one.

Parallelization will help if you need to run this for many different initial conditions though.

Putting parallelization aside for the moment, you will find that if you write the convolution out explicitly using matrix manipulations, your simulation will run much faster. In other words, replace these two lines:

U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + ...
          Dx*convolve2(U_grid, Laplacian, 'wrap'));
V_grid = V_grid + dt*(b*U_grid - U_grid.^2.*V_grid + ...
          Dy*convolve2(V_grid, Laplacian, 'wrap'));

by this instead:

UL = -4*U_grid+U_grid([2:60 1],:)+U_grid([60 1:59],:)+U_grid(:,[2:60 1])+U_grid(:,[60 1:59]);
U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + Dx*UL); 
VL = -4*V_grid+V_grid([2:60 1],:)+V_grid([60 1:59],:)+V_grid(:,[2:60 1])+V_grid(:,[60 1:59]);
V_grid = V_grid + dt*(b*U_grid - U_grid.^2.*V_grid + Dy*VL);

and you will see quite a speedup.

1 Comment
Show -1 older comments Hide -1 older comments

Walter Roberson on 4 Oct 2013

I had written that into my posting and edited it out again when I went the route of iterating symbolically. Part of the reason I dropped it was that I didn't want to try to work out what the edge effects were, but it looks like they were exactly what my first thought was ;-)

Sign in to comment.

Answer 2

Matt J on 3 Oct 2013

Edited: Matt J on 3 Oct 2013

Open in MATLAB Online

1 vote

It is wasteful to repeatedly compute the same fixed M(i,i).* N(i,i).* J(i,i) for every j. Also, matrix operations are already coded to take advantage of your system's multi-threading capabilities. I would just skip the for-loop and do

O(i,i,1:1000000)=M(i,i).* N(i,i).* J(i,i);

1 Comment
Show -1 older comments Hide -1 older comments

Kyle on 3 Oct 2013

Thank you, Matt J. Matrices M, N and J are dynamical actually. I have updated my question for your reference. Thank you

Sign in to comment.

Answer 3

Walter Roberson on 3 Oct 2013

Open in MATLAB Online

0 votes

for j = 1: 1000000
    O(i,i,j) =  M(i,i).* N(i,i).* J(i,i);
end

can be rewritten as

MNO = M(i,i) .* N(i,i) .* J(i,i);
for j = 1 : 1000000
   O(i,i,j) = MNO;
end

which in turn can be rewritten as

joff = sub2ind( size(O), [i, i, 1] );
jskip = size(O,2) .* size(O,3);
O(joff : jskip : end) = M(i,i) .* N(i,i) .* J(i,i);

It probably is not worthwhile to convert this for use with the parallel toolbox, but if you did then you would calculate M(i,i) .* N(i,i) .* J(i,i) ahead of time, and then use a parfor or distributed array to set every jskip'th element of the O array to be the precalculated value.

In order to hook multiple computers up for use in the parallel computing toolbox, you would also need to use the distributed computing toolbox, as by itself the parallel computing can only be local.

If the above is not your pattern, then we will need to see your real pattern to advise you as to whether the parallel computing will help.

5 Comments
Show 3 older comments Hide 3 older comments

Walter Roberson on 3 Oct 2013

Open in MATLAB Online

Your U_grid and V_grid are both incrementally updated based upon the previous iteration's result, and thus upon the history of the calculation right back to the initialization. Such calculations cannot be parallelized except in-so-far as you can parallelize the execution of the individual statement,

U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + ...
          Dx*convolve2(U_grid, Laplacian, 'wrap'));

and then likewise for V_grid. (You cannot update V_grid in parallel with updating U_grid because the V_grid calculation depends upon the updated U_grid.)

Your U_grid and V_grid are only 60 x 60, so they are too small to benefit from multithreading or multiple processors for those two individual statements.

What would be possible (feasible) would be to run multiple independent simulations, each with a different starting random number seed. If the simulation is plausibly going to get trapped in a local minimum that could be worthwhile, but if the simulation is expected to robustly find a (single) global minimum no matter the starting point, then there would probably be no value in doing that.

Walter Roberson on 3 Oct 2013

Open in MATLAB Online

That is applicable if the order of the partial factors does not make any difference -- if the reduction operation is commutative. It is not immediately clear that it would be the case for the above code.

A 1D convolution can be written as a matrix multiplication involving a topelitz matrix, and I have read that a 2D convolution can be done as two 1D convolutions. That would seem to suggest that a 2D convolution could be done as a single matrix multiplication by the product of the two topelitz matrices, but I have not yet found anything that says that explicitly, and I have not found anything that talks about the numeric stability if that were the case.

If X and Y represent the initial U_grid and V_grid respectively, it appears that each iteration goes something like

X = X + dt * (a - (b+1) * X + X^2 * Y + X*Conv)
Y = Y - (X^2 * Y + (Conv - b - 1) * X + a)^2 * Y * dt^3 - (2 * (X^2 * Y + (Conv - b - 1) * X + a)) * (-(1/2) * b + X * Y) * dt^2 + (X * b - X^2 * Y + Dy * Conv) * dt

where Conv is the hypothetical multiplication of the two topelitz matrices as needed to represent the 2D convolution.

It does not at all appear to me that the reduction over addition would involve terms that are independent of the order of calculations, so I doubt the particular code here can be done as a parallel reduction.

Kyle on 6 Oct 2013

Thank you Walter for your suggestions. I'm testing your algorithms.

Sign in to comment.

Is there a way to speed up matrix calculations by Parallel Computing Toolbox?

2 Comments
Show None Hide None

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

5 Comments
Show 3 older comments Hide 3 older comments

Categories

Tags

Community Treasure Hunt

Is there a way to speed up matrix calculations by Parallel Computing Toolbox?

2 Comments Show None Hide None

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

5 Comments Show 3 older comments Hide 3 older comments

Categories

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments

5 Comments
Show 3 older comments Hide 3 older comments