Improving the speed of a double for loop through vectorization or changing summation indices

2 views (last 30 days)
Matthew Kehoe on 15 Jul 2021
Edited: Matthew Kehoe on 22 Jul 2021
I'm writing a double for loop that computes a Pade summation subject to specific coefficients and parameters. I'm curious if there is a more efficient way to write my double for loop. I think that I should avoid writing both cos(theta)^(p-q) and sin(theta)^q as the power operation is expensive.
Here is my current implementation (and three other implementations that are slightly faster).
clc; clear all;
% Test data for function
epsilon = 0.3;
delta = 0.1;
N = 500;
M = 500;
% My code passes through complex coefficients of size (N+1,M+1)
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
% Form the ctilde
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
coeff2 = zeros(N_M_min+1,1);
coeff3 = zeros(N_M_min+1,1);
coeff4 = zeros(N_M_min+1,1);
ntests = 100; % number of test iterations to average exec time
% Method 1 - Original Implementation
tic
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
end
end
end
toc/ntests
% Method 2
tic
for N = 1:ntests
for p=0:N_M_min
c2 = 1;
for q=0:p
if (q == 0)
c2 = 1;
else
c2 = c2*sin(theta);
end
coeff2(p+1) = coeff2(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*c2;
end
end
end
toc/ntests
% Method 3
tic
c1 = cos(theta).^(0:N_M_min);
c2 = sin(theta).^(0:N_M_min);
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff3(p+1) = coeff3(p+1) + c(p-q+1,q+1)*c1(p-q+1)*c2(q+1);
end
end
end
toc/ntests
% Method 4 - Fastest but slightly less accurate than method 3
tic
for N = 1:ntests
vander = (cos(theta).^(0:N_M_min).') .* (sin(theta).^(0:N_M_min));
for p=0:N_M_min
for q=0:p
coeff4(p+1) = coeff4(p+1) + c(p-q+1,q+1)*vander(p-q+1,q+1);
end
end
end
toc/ntests
diff = norm(coeff-coeff2,inf); % is tiny, 3.3e-21
diff2 = norm(coeff-coeff3,inf); % is zero
diff3 = norm(coeff-coeff4,inf); % is tiny 3.5e-15 (not zero as in method 3)
My second implementation runs almost twice as fast locally. The third and fourth implementation increase the speed by around a factor of 4.
The average executive time for method 1 is 0.0589 seconds.
The average executive time for method 2 is 0.0392 seconds.
The average executive time for method 3 is 0.0186 seconds.
The average executive time for method 4 is 0.0116 seconds.
However, I don't know how to speed up the calculation for the coefficients cos(theta)^(p-q). Is there a more efficient way of writing this double for loop through avoiding the power operation or using vectorization (my Matlab code will call this subroutine more than one hundred thousand times)?

David Hill on 15 Jul 2021
Edited: David Hill on 15 Jul 2021
What is rep doing? Seems to just be multiplying.
for rep = 1:10000
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;%there is no rep in the equations
end
end
end
I believe all your nested loops can be consolidated as such:
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
tic;
c=rot90(c);
coeff=10000*arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
toc;
This runs in about 0.003 s.
Matthew Kehoe on 15 Jul 2021
I only included rep for testing purposes (I shouldn't have included it and have edited my original post). The 51 coefficient values are different as
clear all;
clc;
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
% Method 1 - Original Matlab code
tic
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
end
end
toc
% Method 2
tic
c=rot90(c);
coeff2=arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
coeff2=coeff2';
toc
diff = norm(coeff - coeff2,inf); % returns 0.6371
% The average execution time for method 1 is 0.088334 seconds.
% The average execution time for method 1 is 0.305050 seconds.
It appears that calling arrayfun() is slower than writing for loops.

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!