How to speed up this code consisting of 6 nested FOR loops?

Question

0 votes

Hi All,

I am trying to speed up the code below which consists of 6 nested FOR loops.

The purpose of this code is to fill a large, sparse matrix A with ones, and a large vector b with elements from a matrix dd of size TxTxN.

I have access to the Parallel Computing Toolbox, and ultimately this code will run on a cluster.

Any comments/suggestions would be greatly appreciated.

N= big number (e.g. 10000)
T=24;  
A=sparse(N*T*(T+1)/2,(N-1)*T+T); %matrix to be filled
b=sparse(N*T*(T+1)/2,1); %vector to be filled
tempT = (T+1)*T/2; % constant used in the loop
for j=1:N
for t1=1:T
   for t2=t1:T
       for t3=1:t2-t1+1
           rw = (j-1)*tempT + (t1-1)*(T-t1/2)+t2;
           cl = (j-1)*T+t1-1+t3;
           A(rw,cl) = 1;
             for t4=t1:t2
                 for t5=t4:t2
                 b(rw) = b(rw) + dd(t4,t5,j); % dd is given, size(dd)=TxTxN
                 end
             end
           end
       end
    end
    end

An earlier discussion on a simplified version of this with only 3 nested loops can be found here: http://www.mathworks.com/matlabcentral/answers/52005-how-to-transform-these-three-nested-for-loops-into-a-parfor-loop

8 Comments
Show 6 older comments Hide 6 older comments

Tanguy on 29 Oct 2012

Edited: Tanguy on 30 Oct 2012

The function of this code is to fill a sparse matrix A with ones, and a large vector b with elements from a matrix dd of size TxTxN.

Inputs: N, T, dd

Outputs: A, b

Any input would be appreciated, thanks!

Tanguy on 30 Oct 2012

That's right TxTxN

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 30 Oct 2012

Edited: Matt J on 30 Oct 2012

Open in MATLAB Online

0 votes

Here's one proposal,

    N=1e4;
    T=24; 
    %dd=rand(T,T,N); %fake
    tempT = (T+1)*T/2;
    [T1,T2,T3]=ndgrid(1:T,1:T,1:T);
    idx = (T2>=T1) & (T3<=T2-T1+1);
    T1=T1(idx);
    T2=T2(idx);
    T3=T3(idx);
    nt=numel(T1);
    %Make row/col data
    RW=(T1-1).*(T-T1/2)+T2;
    CL=T1-1+T3;
    RW=bsxfun(@plus,RW,(0:N-1)*tempT);
    CL=bsxfun(@plus,CL,(0:N-1)*T);
    %For building b
    Z=zeros(nt,T^2);
    for ii=1:nt % A very small loop, no need to optimize
           [T4,T5]=ndgrid(T1(ii):T2(ii),T1(ii):T2(ii));
           idx=T5>=T4;
           T4=T4(idx);
           T5=T5(idx);
           ind=sub2ind([T,T],T4,T5);
           Z(ii,ind)=1;
    end
    ddd=Z*reshape(dd,[],N);
    A=sparse(RW,CL,1, N*T*(T+1)/2,N*T);
    b=sparse(RW(:),1,ddd(:),N*T*(T+1)/2,1);

4 Comments
Show 2 older comments Hide 2 older comments

Tanguy on 1 Nov 2012

Edited: Tanguy on 1 Nov 2012

Open in MATLAB Online

Matt,

I have a follow-up question regarding the solution you proposed to build A and b only using loopless serial commands. My question only concerns vector b.

I have noticed discrepancies between the values I get for vector b when I use the nested FOR loops, and the values I get for b when I use the solution you proposed, and I can't explain why. The code below shows the differences I get between the two methods.

I would greatly appreciate any comments or suggestions.

N = 10;
T = 24;
dd = rand(T,T,N);
tempT = (T+1)*T/2; 
%-------------------------------
%b1: building b with FOR loops
%-------------------------------
b1=sparse(N*T*(T+1)/2,1);
for j=1:N
    for t1=1:T
       for t2=t1:T
           rw = (j-1)*tempT + (t1-1)*(T-t1/2)+t2;
           for t4=t1:t2
               for t5=t4:t2
                 b1(rw) = b1(rw) + dd(t4,t5,j);
               end
           end
       end
    end
end
%-------------------------------
%b2: building b with loopless serial commands
%-------------------------------
[T1,T2]=ndgrid(1:T,1:T);
idx = (T2>=T1);
T1=T1(idx);
T2=T2(idx);
nt=numel(T1);
RW=(T1-1).*(T-T1/2)+T2;
RW=bsxfun(@plus,RW,(0:N-1)*tempT);
Z=zeros(nt,T^2);
for ii=1:nt
         [T4,T5]=ndgrid(T1(ii):T2(ii),T1(ii):T2(ii));
         idx=T5>=T4;
         T4=T4(idx);
         T5=T5(idx);
         ind=sub2ind([T,T],T4,T5);
         Z(ii,ind)=1;
  end
  ddd=Z*reshape(dd,[],N);
b2=sparse(RW(:),1,ddd(:),N*T*(T+1)/2,1); 
%-------------------------------
%b1-b2
%-------------------------------
nnz(b1-b2) 
max(b1-b2) %should be zero

Matt J on 1 Nov 2012

Open in MATLAB Online

The values I get for max(b1-b2) are tiny ones on the order of 1e-13. This is easily attributable to floating point precision issues. Remember, the two approaches are not adding up the numbers in the same order. Are you not getting similarly small values?

A stronger check is to look at the percent error, for which I also get tiny values:

    >> percentError = full( sum(abs(b1-b2))/sum(abs(b1))*100   )
    percentError =
       2.5104e-14

Tanguy on 1 Nov 2012

Edited: Tanguy on 1 Nov 2012

Thanks for your reply. Yes I get similarly small values, but this leads the optimization algorithm which takes b as an input to return a different solution (corresponding to the same value of the objective function though, so that's fine). That's why I was asking. Thanks for your input.

Sign in to comment.

How to speed up this code consisting of 6 nested FOR loops?

8 Comments
Show 6 older comments Hide 6 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

How to speed up this code consisting of 6 nested FOR loops?

8 Comments Show 6 older comments Hide 6 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

8 Comments
Show 6 older comments Hide 6 older comments

4 Comments
Show 2 older comments Hide 2 older comments