How to speed up reading from N-dimensional matrix in a nested loop?
    7 views (last 30 days)
  
       Show older comments
    
Hi,
I've got a piece of code that has to run in many nested loops. In the next-to-most inner loop I need to grab some elements from an N-dimensional matrix (N>=5) and perform some calculations on these values. When running the profiler I found that the bottle neck is reading elements from the N-dimensional matrix. The matrix size will typically be [9,9,9,9,9,100] so even though it is big, it's not extremely large.
I wonder if you guys have any good idea on how to speed up this code? Any ideas would be extremely welcome - I've spent days reading on google and trying different things in the profiler, and the quickest way I've found is to stack all the data into a 3D matrix, and looping through the 3rd dimension of this new matrix to perform the calculations.
See a code example below (in this code there might be ways to get rid of some of the loops, but that won't be the case in my real code since that will have a lot of logic on each level). It takes ~1.6sec to run on my laptop, and of this ~1.35 sec is spent reading from the N-dimensional matrix...
Kindly, Anders
% File: test.m  
clear
%%This section can not be changed in my real code).
%Set up some grids:
small_grid=3;
large_grid=4;
H=small_grid;
a=linspace(1,4,large_grid)'; A=length(a);
b=linspace(1,3,small_grid)'; B=length(b);
c=linspace(1,2,small_grid)';C=length(c);
d=linspace(1,4,small_grid)';D=length(d);
e=linspace(1,4,10)';E=length(e);
f=1:10;
[ga gb gc gd ge]=ndgrid(a,b,c,d,e);
I=32;%
fn1=ga./3+gb/.5 +gc./3 +gd./4 +ge/4;%Some function
fn2=ga./1+gb/.1 +gc./10 +gd./4 +ge/.1;%Some other function
soughtValue=zeros(length(f),8);
tic
for h=1:H
  for a_=1:A
    aa=rand(I,1).*(a(end)-a(1))+a(1);%In the real code each level will have a lot of logic...
    for b_=1:B
      bb=rand(I,1).*(b(end)-b(1))+b(1);
      for c_=1:C
        cc=rand(I,1).*(c(end)-c(1))+c(1);
        for d_=1:D
          dd=rand(I,1).*(d(end)-d(1))+d(1);
          for e_=1:E
            ee=rand(I,1).*(e(end)-e(1))+e(1);
              % Calculate which elements to pick out
              before1=floor(rand(length(f),1)*(length(a)-1)+1);
              after1=before1+1;
              before2=floor(rand(length(f),1)*(length(b)-1)+1);
              after2=before2+1;
              c_index=floor(rand(length(f),1)*(C-1)+1);
              d_index=floor(rand(length(f),1)*(D-1)+1);
              % Some parameters
              a1=floor(rand(length(f))*(length(f)-1)+1);
              a2=floor(rand(size(f))*(length(f)-1)+1);
              a3=floor(rand(size(f))*(length(f)-1)+1);
              a4=floor(rand(size(f))*(length(f)-1)+1);
              %%This is the section that can be changed in order to speed up calculations
              for ff=1:length(f)
                %I've found that it's quickest to store the data in a 3D
                %matrix and then loop over the last dimension of this matrix
                %to perform the calculations. But even this is slow!
                new_grid(:,:,1)=  (fn1(:,:,c_index(ff)  ,d_index(ff) ,ff));
                new_grid(:,:,2)=  (fn1(:,:,c_index(ff)+1,d_index(ff) ,ff));
                new_grid(:,:,3)=  (fn1(:,:,c_index(ff)  ,d_index(ff)+1,ff));
                new_grid(:,:,4)=  (fn1(:,:,c_index(ff)+1,d_index(ff)+1,ff));
                new_grid(:,:,5)=  (fn2(:,:,c_index(ff)  ,d_index(ff) ,ff));
                new_grid(:,:,6)=  (fn2(:,:,c_index(ff)+1,d_index(ff) ,ff));
                new_grid(:,:,7)=  (fn2(:,:,c_index(ff)  ,d_index(ff)+1,ff));
                new_grid(:,:,8)=  (fn2(:,:,c_index(ff)+1,d_index(ff)+1,ff));
                for gg=1:8
                  soughtValue(ff,gg) =...
                    ( a1(ff).*new_grid(before1(ff),before2(ff),gg) ) ...
                   +( a2(ff).*new_grid(before1(ff),after2(ff),gg)  ) ...
                   +( a3(ff).*new_grid(after1(ff) ,before2(ff),gg)  ) ...
                   +( a4(ff).*new_grid(after1(ff) ,after2(ff),gg)   );
                end
                % The first iteration of the loop will look like this. Notice
                % that this is many times slower than using the loop!
  %               soughtValue(ff,1)=...
  %                   ( a1(ff).*(fn1(before1(ff),before2(ff),c_index(ff)  ,d_index(ff) ,ff)) ) ...
  %                  +( a2(ff).*(fn1(before1(ff),after2(ff),c_index(ff)  ,d_index(ff) ,ff)) ) ...
  %                  +( a3(ff).*(fn1(after1(ff),before2(ff),c_index(ff)  ,d_index(ff) ,ff)) ) ...
  %                  +( a4(ff).*(fn1(after1(ff),after2(ff),c_index(ff)  ,d_index(ff) ,ff))  );
              end
              %%End of code that can be changed
            end
          end
        end
      end
    end
  end
  toc
0 Comments
Accepted Answer
  Cedric
      
      
 on 21 Jan 2013
        Hi Anders,
Just by replacing the 8 lines defining new_grid with
 d3index = [c_index(ff),c_index(ff)+1] ;
 new_grid = cat(3, ...
                fn1(:,:,d3index,d_index(ff),ff), ...
                fn1(:,:,d3index,d_index(ff)+1,ff), ...
                fn2(:,:,d3index,d_index(ff),ff), ...
                fn2(:,:,d3index,d_index(ff)+1,ff)) ;
you would make it slightly more efficient. About the loop that follows, are you sure that a1(ff),..,a4(ff) are correct indexing-wise? These are 2D arrays that you are accessing using linear indexing.. which seems weird in this context in the sense that you will get the first column of each matrix with ff in [1,..,numel(f)], which we usually index with e.g. a1(ff,1).
Cheers,
Cedric
7 Comments
  Matt J
      
      
 on 22 Jan 2013
				
      Edited: Matt J
      
      
 on 22 Jan 2013
  
			The a1 line should read: a1=floor(rand(size(f)) x (length(f)-1)+1);
If that's the case then all lines a1...a4 are computed with the exact same statement. It then makes more sense to compute a single A matrix rather than 4 separate "a" variables
A=floor(rand(length(f),4)*(length(f)-1)+1);
where now a1=A(:,1), a2=A(:,2), etc...
More Answers (2)
  Matt J
      
      
 on 21 Jan 2013
        
      Edited: Matt J
      
      
 on 21 Jan 2013
  
      You can get rid of the loop over gg
soughtValue(ff,:)= ...
                  ( a1(ff).*new_grid(before1(ff),before2(ff),:) ) ...
                 +( a2(ff).*new_grid(before1(ff),after2(ff),:)  ) ...
                 +( a3(ff).*new_grid(after1(ff) ,before2(ff),:)  ) ...
                 +( a4(ff).*new_grid(after1(ff) ,after2(ff),:)   );
Also, you should avoid repeated matrix indexing operations. If you're going to be reusing portions of an array, save them to a temporary variable, e.g.,
   cff=c_index(ff);
   dff=d_index(ff);
   tmp=[cff,cff+1];
                  new_grid(:,:,1:2)=  (fn1(:,:,tmp ,dff ,ff));
                  new_grid(:,:,3:4)=  (fn1(:,:,tmp ,dff+1,ff));
                  new_grid(:,:,5:6)=  (fn2(:,:,tmp ,dff ,ff));
                  new_grid(:,:,7:8)=  (fn2(:,:,tmp ,dff+1,ff));
4 Comments
  Matt J
      
      
 on 21 Jan 2013
				Not so surprisingly. PERMUTE is expensive. It wouldn't make sense to do it inside any of the loops. I'm also not sure what this
 if all([h,a_,b_,c_,d_,e_]==1)
is for.
  Cedric
      
      
 on 21 Jan 2013
				It is for permuting fn1 and fn2 only at the first overall iteration, just to fulfill the requirement not the bring modifications outside of the block that he define as being open to modifications.
The if all() doesn't slow down significantly the execution as far as I can see.
  Matt J
      
      
 on 21 Jan 2013
        
      Edited: Matt J
      
      
 on 21 Jan 2013
  
      Alright, well here's my fastest version. With the following, I get a 25% speed-up over the original code.
        fn=cat(6,fn1,fn2); %PRIOR TO LOOPS
             for ff=1:length(f)
               new_grid=  fn(before1(ff)+[0,1],before2(ff)+[0,1],...
                             c_index(ff)+[0,1] , d_index(ff)+[0,1],...
                             ff,:);
                AA(4)=a4(ff);
                AA(3)=a2(ff);
                AA(2)=a3(ff);
                AA(1)=a1(ff);
                soughtValue(ff,:)=AA*reshape(new_grid,4,8);
              end
6 Comments
  Matt J
      
      
 on 23 Jan 2013
				
      Edited: Matt J
      
      
 on 23 Jan 2013
  
			Hi Anders. Not sure what we're calling "the last part". I definitely find that
    soughtValue(ff,:)=AA*reshape(new_grid,4,8);
slows things down, as compared to a for-loop, but I haven't measured whether one is twice as fast as the other. It certainly doesn't make more than a 5% impact on the total code's execution time. My fastest version right now is below. By my timings, it's a few percent slower than Cedric's.
My best guess as to why the for-loop is faster is that, maybe, it doesn't allocate any temporary arrays. Possibly, a matrix-vector multiplication always puts its output in a temporary vector, whereas your for-loop might copy results into soughtValue(ff,gg) in a purely in-place fashion.
A few final remarks on all of this:
- Because you're jumping around to random locations in an array pulling out values, there are going to be serious limits on how fast you can get. Memory accesses are fastest when done sequentially.
- You're doing lots of redundant calculations throughout the code as whole. It doesn't seem to be a good idea to be calling length(f) and size(f) so repeatedly within a for-loop. Each execution by itself is obviously pretty trivial, but the overhead of all those function calls probably adds up.
           %Pre-loop computations
           fn=cat(6,fn1,fn2);
           new_grid=zeros(4,8);
           nf=length(f);
             for ff=1:nf
              new_grid(:) =  fn(before1(ff)+[0,1],before2(ff)+[0,1],...
                             c_index(ff)+[0,1] , d_index(ff)+[0,1],...
                             ff,:);
             %AA1=a1(ff); AA2=a2(ff); AA3=a3(ff); AA4=a4(ff);
                for gg=1:8
                  soughtValue(ff,gg) =...
                    a1(ff).*new_grid(1,gg)+ ...
                    a2(ff).*new_grid(2,gg)+ ...
                    a3(ff).*new_grid(3,gg)+ ...
                    a4(ff).*new_grid(4,gg);
                end
              end
See Also
Categories
				Find more on Response Computation and Visualization in Help Center and File Exchange
			
	Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

