2 views (last 30 days)

Show older comments

I'm writing a double for loop that computes a Pade summation subject to specific coefficients and parameters. I'm curious if there is a more efficient way to write my double for loop. I think that I should avoid writing both cos(theta)^(p-q) and sin(theta)^q as the power operation is expensive.

Here is my current implementation (and three other implementations that are slightly faster).

clc; clear all;

% Test data for function

epsilon = 0.3;

delta = 0.1;

N = 500;

M = 500;

% My code passes through complex coefficients of size (N+1,M+1)

c = rand(N+1,M+1) + 1i*rand(N+1,M+1);

rho = sqrt(epsilon^2 + delta^2);

theta = atan2(delta,epsilon);

% Form the ctilde

N_M_min = min(N,M);

coeff = zeros(N_M_min+1,1);

coeff2 = zeros(N_M_min+1,1);

coeff3 = zeros(N_M_min+1,1);

coeff4 = zeros(N_M_min+1,1);

ntests = 100; % number of test iterations to average exec time

% Method 1 - Original Implementation

tic

for N = 1:ntests

for p=0:N_M_min

for q=0:p

coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;

end

end

end

toc/ntests

% Method 2

tic

for N = 1:ntests

for p=0:N_M_min

c2 = 1;

for q=0:p

if (q == 0)

c2 = 1;

else

c2 = c2*sin(theta);

end

coeff2(p+1) = coeff2(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*c2;

end

end

end

toc/ntests

% Method 3

tic

c1 = cos(theta).^(0:N_M_min);

c2 = sin(theta).^(0:N_M_min);

for N = 1:ntests

for p=0:N_M_min

for q=0:p

coeff3(p+1) = coeff3(p+1) + c(p-q+1,q+1)*c1(p-q+1)*c2(q+1);

end

end

end

toc/ntests

% Method 4 - Fastest but slightly less accurate than method 3

tic

for N = 1:ntests

vander = (cos(theta).^(0:N_M_min).') .* (sin(theta).^(0:N_M_min));

for p=0:N_M_min

for q=0:p

coeff4(p+1) = coeff4(p+1) + c(p-q+1,q+1)*vander(p-q+1,q+1);

end

end

end

toc/ntests

diff = norm(coeff-coeff2,inf); % is tiny, 3.3e-21

diff2 = norm(coeff-coeff3,inf); % is zero

diff3 = norm(coeff-coeff4,inf); % is tiny 3.5e-15 (not zero as in method 3)

My second implementation runs almost twice as fast locally. The third and fourth implementation increase the speed by around a factor of 4.

The average executive time for method 1 is 0.0589 seconds.

The average executive time for method 2 is 0.0392 seconds.

The average executive time for method 3 is 0.0186 seconds.

The average executive time for method 4 is 0.0116 seconds.

However, I don't know how to speed up the calculation for the coefficients cos(theta)^(p-q). Is there a more efficient way of writing this double for loop through avoiding the power operation or using vectorization (my Matlab code will call this subroutine more than one hundred thousand times)?

David Hill
on 15 Jul 2021

Edited: David Hill
on 15 Jul 2021

What is rep doing? Seems to just be multiplying.

for rep = 1:10000

for p=0:N_M_min

for q=0:p

coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;%there is no rep in the equations

end

end

end

I believe all your nested loops can be consolidated as such:

epsilon = 0.3;

delta = 0.1;

N = 50;

M = 50;

c = rand(N+1,M+1) + 1i*rand(N+1,M+1);

rho = sqrt(epsilon^2 + delta^2);

theta = atan2(delta,epsilon);

N_M_min = min(N,M);

tic;

c=rot90(c);

coeff=10000*arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);

toc;

This runs in about 0.003 s.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!