Accelerating code wont work

Question

Marc Laub on 8 Nov 2023

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/2044797-accelerating-code-wont-work

Answered: Thomas Ibbotson on 9 Nov 2023

Hey guys,

I got a piece of code, that has to run repetitive and I would be glad if I could reduce execution time, since I do not know how complex the whole thing might become in the future...

For those who care about what it does: it solves diffusion equation für spherical particals that additionaly shrink and grow whil some diffuses in or out.

Theoretically, I just thought parfor would already help, but parfor actually tikes like 4 times longer, even though I already reduced broadcast etc...

D = 1e-18;
r_p=200*10^-9;
xmesh = 0:10^-9:r_p;
N = numel(xmesh);
r = xmesh;
dr = xmesh(2)-xmesh(1);
dt = 0.5*dr^2/D;
tspan = [0:dt:100];
% helper
g = D*dt/(2*dr^2);
% vectorized sparse construction of matrices
offdiags = ...
    + sparse(2:(N-1),1:(N-2),+(dr./r(2:(end-1))-1)*g , N,N) ...
    + sparse(2:(N-1),3:(N  ),-(dr./r(2:(end-1))+1)*g, N,N);
ML = sparse(2:(N-1),2:(N-1), ones(N-2,1) + 2*g , N,N) + offdiags;
MR = sparse(2:(N-1),2:(N-1), ones(N-2,1) - 2*g , N,N) - offdiags;
% boundary conditions
% ML(1,  1:3  ) = [-3/2 ,  2 , -1/2]/dr;
% ML(N,(N-2):N) = [ 1/2 , -2 ,  3/2]/dr;
radi=round(15*lognrnd(0,0.4,2000,1));
radi(radi>100)=100;
u=cell(2000,1);
for i=1:2000
    u{i}= 0*ones(radi(i),1);
    u{i}(end)=1;
end
c_eq=1;
R=cellfun('length',u)*10^-9;
N=numel(u);
dx=zeros(2000,1);
close all
figure
time_id=1;
% ML = parallel.pool.Constant(ML);
% MR = parallel.pool.Constant(MR);
tic
while time_id<(numel(tspan)-1)
    time_id=time_id+1;
    c_eq=c_eq*0.99;
%  parfor k=1:numel(u)
    for k=1:numel(u)
        dx(k)=dx(k)+2*10^-17*dt*1/R(k);
        R(k)=R(k)+2*10^-17*dt*1/R(k);
        if dx(k)>=10^-9
            dx(k)=dx(k)-10^-9;
            u{k}(end+1)=c_eq;
            N=numel(u{k});
        elseif dx(k)<=-10^-9
            dx(k)=dx(k)+10^-9;
            u{k}(end)=[];
            u{k}(end-1)=c_eq;
            N=numel(u{k});
        else
            N=numel(u{k});
        end
%                       ML_n = spalloc(N,N,3*N-1);
%                       MR_n = spalloc(N,N,3*N-1);
        %               ML_n=ML.Value(1:N,1:N);
        %               MR_n=MR.Value(1:N,1:N);
        ML_n=ML(1:N,1:N);
        MR_n=MR(1:N,1:N);
        ML_n(1,  1:3  ) = [-3/2 ,  2 , -1/2]/dr;
        ML_n(N,(N-2):N) = [ 1/2 , -2 ,  3/2]/dr;
        up = u{k};
        d=MR_n*up;
        u{k}=ML_n\d;
        u{k}(end)=c_eq;
    end
end
toc

So up to the tic command, just hat to be setup once. Since the particles have different sizes, the matruces ML and MR have different sizes for each particle. To avoid storing many different matrices and changing the size of those matrices as the particles grow or shrink, I calculatet a large matrix (for a particle size that wont be reached within the simulation) nd then just take the matrix I need for each particle, as submatrices.

So that large matrix and the cell_array with the concentration profile for each particle are the onle broadcast variables, and they do not seem very big to me.... I tried to implement constant variables for all workes, try to preallocate the non zero values for the sparse sub-matrices... everything just made it worse...

So is there any idea left to get some more performance out of it?

Many thanks in advance

Best regards

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Thomas Ibbotson on 9 Nov 2023

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/2044797-accelerating-code-wont-work#answer_1349517

The matrix operations you are using (mldivide and matrix multiplication) are already implicitly multi-threaded and will use all the CPUs on your system. This means parfor cannot improve on this on your local machine and in general will slow things down due to the extra communication and process overheads. See https://uk.mathworks.com/discovery/matlab-multicore.html

As for how to make this faster, you could scale up to multiple machines using MATLAB Parallel Server. However, whether or not this will speed things up depends on whether the computation time for each loop is long compared to communication overheads. Also with MATLAB Parallel Server, each worker is restricted to a single computational core so you lose the benefits of implicit multi-threading, which means you need to scale up enough to overcome this limitation.

For alternatives to improving this on your local machine you would probably need to look at algorithmic improvements, which unforunately I can't help with as I'm not a numerical algorithms expert. You could also profile the code using the MATLAB profiler to see where the bottlenecks are. However, I suspect it's unlikely that the bottlenecks will be outside the matrix computations, but you should check just in case.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Accelerating code wont work

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Accelerating code wont work

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments