How can I efficiently add multiple arrays generated in a loop?

8 views (last 30 days)
Hi,
I am trying to visualize a lot of (point, intensity) values as disks and gaussians whose values add up when they overlap.
The following code does what I want, but is very slow, taking several minutes once there are a lot of points. I would like it to work for 1e5 or 1e6 points in a reasonable time:
tic;
Npixels_X = 4096; % image size X
Npixels_Y = 3537; % image size Y
N=10; % number of points
x = randi(Npixels_X,[1,N]); % x coordinates
y = randi(Npixels_Y,[1,N]); % y coordinates
intensity = randi(100, [1,N]); % intensity
r0=100; % radius of circles
[X,Y] = meshgrid(1:Npixels_X, 1:Npixels_Y);
gaussianPoints = zeros(size(X));
circlePoints = zeros(size(X));
for idx=1:N
r_squared = (X-x(idx)).^2 + (Y-y(idx)).^2;
gaussianPoints = gaussianPoints + intensity(idx) * exp(-r_squared./(r0^2));
circlePoints = circlePoints + intensity(idx) * ( r_squared <= r0.^2 );
end
figure;
imagesc(gaussianPoints);
figure;
imagesc(circlePoints);
toc;
Is there a more efficient way of doing this?
I sense that parfor, bsxfun or some array reshaping might help, but am not quite sure how to go about it yet.
  2 Comments
weikang zhao
weikang zhao on 4 Mar 2021
This problem is suitable for the use of GPU computing or parallel computing, but parallel computing can not bring a huge improvement, you can also try to use the C language for mixed compilation. If you are not in a hurry for a solution, you can provide me with an email address and I will help you optimize this code when I have free time.
zjw z
zjw z on 8 Mar 2021
I am not in a hurry, but would be happy if you could point me in the right direction to get started with one of the options you mentioned. I prefer getting help here rather than by mail for now.

Sign in to comment.

Answers (1)

Jan
Jan on 4 Mar 2021
Edited: Jan on 4 Mar 2021
EXP is very expensive. This is the bottleneck of your code.
Instead of applying it to matrices produced by MESHGRID, provide the vectors instead an use:
exp(a + b) = exp(a) * exp(b)
tic;
r02 = r0^2; % Move repeated squaring out of the loop
X = (1:Npixels_X); % Vectors instead of redundant matrices by MESHGRID
Y = (1:Npixels_Y).'; % Column vector
G = 0; % Slightly cheaper than ZEROS()
C = 0;
for idx = 1:N
RX = (X - x(idx)) .^ 2; % Row vector
RY = (Y - y(idx)) .^ 2; % Column vector
eRX = exp(-RX / r02); % EXP over vectors
eRY = exp(-RY / r02);
G = G + intensity(idx) * eRX .* eRY; % auto-expand: >= Matlab R2016b
C = C + intensity(idx) * ((RX + RY) <= r02); % auto-expand
end
toc
Except for rounding errors the results are equal. On my Core i5 mobile, Matlab 2018b the processing time shrinks from 2.48 seconds to 0.52 seconds.
If the code need 0.5 seconds for N=10, it takes about 14 hours for N=1e6.
You can split the job in parts and run them in parallel.
% Naive PARFOR approach:
N = 100; % Multiple of 10
... as above
r02 = r0^2;
X = (1:Npixels_X);
Y = (1:Npixels_Y).';
pG = cell(1, 10); % Let clients collect the data separately
pC = cell(1, 10);
parfor ip = 1:10
G = 0;
C = 0;
q = 1 + (ip - 1) * (N / 10);
for idx = q:q+(N / 10)-1
RX = (X - x(idx)) .^ 2;
RY = (Y - y(idx)) .^ 2;
eRX = exp(-RX / r02);
eRY = exp(-RY / r02);
G = G + intensity(idx) * eRX .* eRY;
C = C + intensity(idx) * ((RX + RY) <= r02);
end
pG{ip} = G;
pC{ip} = C;
end
G = 0; % Sum up results finally:
C = 0;
for ip = 1:10
G = G + pG{ip};
C = C + pC{ip};
end
This takes the double time on my 2 code i5 - not the half time. Matlab warns, that "intensity", "x" and "y" are expensive broadcast variables. But I do not know how to avoid this.
For a real problem, I'd start as many Matlab instances as I have cores and distribute the data with running the non-PARFOR approach.
To be honest: Maybe others have more success with a parallelization. I did not find any code, which runs faster with PARFOR yet, even on my 4-core i7 desktop. Either PARFOR is not sufficient or my approachs are to naive.
  1 Comment
zjw z
zjw z on 8 Mar 2021
Thanks for that. It does save time, although I don't think it will be sufficient for what I had in mind. (<5min rendering) Calculating RX and RY separately indeed makes a lot of sense. :)
I might have a look at GPU options or restrict it to look at a subset of the data.
I also don't need circles and gaussians at the same time (can be a user option), but even with just one of them, it takes too long for 1e5 points.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!