Vectorized code slower than loops?

Question

0 votes

This question is a bit an offspring from an other one, but I have the following two codes:

maxN = 100;
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels)); % cleaning 
bessels = ones(1201, 1201, 101);    % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            mm = 1;
            m = 1:2:n;
            numOfEl = ceil(n/2);
            umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
        end
    end
end
toc
tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            mm = 1;
            for m = 1 : 2 : n
                umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
                mm = mm + 1;
            end
        end
    end
end
toc

And it tourns out, that loops version is faste >2x. Why is that so? I know that i happens if vectorization requiers large temporary variables, but (it seems) it is not true here.

And generally, what (other than parfor) can I do to speed up this code?

Best regards, Alex

1 Comment
Show -1 older comments Hide -1 older comments

Alexandra Harkai on 2 Sep 2016

Not sure about the speedup possibilities just yet, but regarding the vectorisation, this may be helpful in seeing where the vector/loop implementations make a difference: http://www.matlabtips.com/matlab-is-no-longer-slow-at-for-loops/

Sign in to comment.

Sign in to answer this question.

Sign in to follow activity

Answer 1

per isakson on 2 Sep 2016

Edited: per isakson on 3 Sep 2016

Open in MATLAB Online

1 vote

Given

Matlab stores matrices in column-major order.
bessels and posMcontainer are both large

Possibly the transport of data between the memory and the cpu will be more efficient (the caches will work better) if

umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);

was replaced by

umn(mm:mm+numOfEl-1,nn) = bessels(nn, i, j) * posMcontainer(m, i, j);

The same should apply to the "all-for-loop-case".

See Columns and Rows are not the same

&nbsp

And finally the test from Columns and Rows are not the same with an additional case. (R2016a,

result =runperf('NestedLoops.m');
fullTable = vertcat(result.Samples);   
varfun(@mean,fullTable,'InputVariables'         ...
      ,'MeasuredTime','GroupingVariables','Name')
ans = 
           Name           GroupCount    mean_MeasuredTime
    __________________    __________    _________________
    NestedLoops/test      4             1.3266          
    NestedLoops/test_1    4             0.88148          
    NestedLoops/test_2    4             0.49775

where NestedLoops.m contains

X=rand(100,100,2000);
for ii=1:100
    for jj=1:100
        X(ii,jj,:)=10*X(ii,jj,:);
    end
end
X=rand(100,100,2000);
for jj=1:100
    for ii=1:100
        X(ii,jj,:)=10*X(ii,jj,:);
    end
end
X=rand(2000,100,100);
for jj=1:100
    for ii=1:100
        X(:,ii,jj)=10*X(:,ii,jj);
    end
end

The "differences" between the "cases" are actually larger, since

>> tic, X=rand(100,100,2000);, toc
Elapsed time is 0.355542 seconds.

6 Comments
Show 4 older comments Hide 4 older comments

Alex Kurek on 3 Sep 2016

I do not know C language. But if you want, it is here: https://www.dropbox.com/s/69bf8fj7lc6cnbc/fisherComputer.c?dl=0

per isakson on 3 Sep 2016

Edited: per isakson on 5 Sep 2016

Thanks, but TLNR.

Neither do I, however I get the impression that Coder switches the order of the loops to account for the difference in major order.

"slowed down a bit in .mex" &nbsp Now, I believe that one should code for column-major in Matlab and that Coder adapts the C-code to row-major. However, it puzzles me that the difference in C is only "a bit", since in Matlab it's significant.

Sign in to comment.

Vectorized code slower than loops?

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

Vectorized code slower than loops?

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

6 Comments
Show 4 older comments Hide 4 older comments