How can I use C++ to speed up my matlab code ?

Hi every one, I am working on image processing and I generated a matlab code with many for loops, it takes too long to display results, can any one please tell how can I speed it up using C++ ?
thank you !

5 Comments

There are many ways to speed up code using multiple for loops. C++ is not generally top of the list though. There are usually methods within Matlab itself that can produce initial speedups.
doc mex
will lead you to a bunch of links for how to integrate C++ code though if you really want.
We would need to see the code you are currently using before we could offer meaningful advice.
this is my function which takes a tensor of size MxNxV and returns a large matrix of correlations between all pairs of vectors.
function [H]=createHON(Mat)
v=[];
c=size(Mat,1);
for i = 1:c
for j =i:c
for d=1:c
for l =d:c
a = corrcoef(Mat(i,j,:),Mat(d,l,:));
v=[ v;a(2,1) ];
end
end
end
end
k=1;
for i = 1:nchoosek(c,2)+c
for j = 1:1:nchoosek(c,2)+c
H(i,j) = v(k);
k=k+1;
end
end
H(isnan(H))=0;
@mayssa: your is slow because you wrote it to expand the array of every iteration. Much easier than switching to C++ would be to simply write efficient MATLAB code:
that's great! thank you !

Sign in to comment.

 Accepted Answer

Jan
Jan on 7 Apr 2017
Edited: Jan on 7 Apr 2017
Sorry, I cannot resist :-)
MyCorrCoeff2 can still be improved by considering the columnwise storing of elements in arrays. The normalization happens repeatedly inside the inner loops, so it is cheaper to compute it once outside the loops:
function H = MyCorrCoeff3(Mat)
% Author: Jan Simon, License: CC BY-SA 3.0
c = size(Mat,1);
nc = (0.5 * c * (c + 1));
H = ones(nc, nc); % Pre-allocate!!!
MP = permute(Mat, [3, 2, 1]);
MP = MP - sum(MP, 1) / size(MP, 1);
% MP = bsxfun(@minus, MP, sum(MP, 1) / size(MP, 1));
CovN = squeeze(sqrt(sum(MP .* MP, 1)));
jH = 1;
for i = 1:c
for j = i:c
cov1 = MP(:, j, i);
cov1n = CovN(j, i);
iH = 1;
for d = 1:c
for l = d:c
if iH < jH % Use symmetry of H
% Emulate: r = cov([Mat(i, j, :), Mat(d.l,:)]);
r21 = (MP(:, l, d).' * cov1) / cov1n / CovN(l, d);
H(iH, jH) = r21;
H(jH, iH) = r21;
end
iH = iH + 1; % Update indices related to H:
end
end
jH = jH + 1;
end
end
H(isnan(H)) = 0;
0.085 sec, 38 times faster than the original version. :-)
[07-Apr-2017 16:34 UTC] Enough for today. Now the code is in a form which could profit from C-coding. The bottleneck is still with about 85% of the total time:
r21 = (MP(:, d, l).' * cov1) / cov1n / CovN(d, l);
The vector multiplication is calculated in an optimized BLAS-library already. Therefore I think you can reduce the time for the overhead of calling the library and some percent of the loops only, perhaps 10%. It will be more useful to try a parallelization with PARFOR.

4 Comments

Mayssa
Mayssa on 7 Apr 2017
Edited: Mayssa on 8 May 2017
thank you so much for this extra help you gave me, I really appreciate it, you made my day ! and most importantly, thanks for sharing your passion :-)
@mayssa: You are welcome. This is a very nice example to show how Matlab code can be accelerated, and that optimized code looks sometimes much more ugly and is much harder to maintain.
For some other test data like [50, 50, 100] I get a comparable speedup about factor 35. Can your "1 hour" problem be solved in a bearable time now?
One point looks suspicious to me: You want a "matrix of correlations between all pairs of vectors". But this code:
for i = 1:c
for j = i:c
for d = 1:c
for l = d:c
function of Mat(i,j,:) and Mat(d,l, :)
does not access e.g. Mat(2,1,:). Is this wanted? Is Mat symmteric already?
+1 very nice work, Jan Simon.
@Jan:Indeed, I must admit that this was an opportunity for me to learn a lot; still too much to learn though. For the time problem, it is solved, it takes about five minutes now and to answer your question, yes this is wanted, Mat is already symmetric.

Sign in to comment.

More Answers (3)

Almost always using compiled code will not give you great speedups, UNLESS your code was inefficiently written in MATLAB in the first place. Of course then almost anything you do will be an improvement.
But if your code is already well written in MATLAB, using a good choice of algorithms, then trying to go to C or C++ will probably not be worth the time and effort it will take to code. This is certainly true unless you have quite good programming skills in C/C++, AND you understand the essential algorithms well.
The point is almost always the most impact you can get for a given investment of time will come from a better understanding of MATLAB, learning how to improve your code. Often great gains can come from a completely different algorithmic approach, rethinking how you will solve the problem.

3 Comments

thank you for your answer, I surely agree with you but it is because I am dealing with "for loops " on a large size tensors (in a function) and when it comes to this, Matlab is really slow so may be it is worth going to C++ if this can solve the problem of course, to be frank, I am not that good in coding with C++, I tried Matlab coder to convert my function but it didn't work,
The slowness of your code is PURELY due to the fact that you never preallocated v !!!!!!!!! Writing c++ code here is a waste of time. Just write EFFICIENT MATLAB code. MATLAB is NOT "really slow". It is your code as written that is really slow!
When you append one element at a time to v, that forces MATLAB to reallocate the vector v in memory at EVERY iteration of the loop! It then needs to copy the entire array over. Small at first, but as v grows in size, this take serious time to do. This dynamic array growth is a bad idea. It is fine for tiny arrays, but as the arrays grow in size, not true. Instead, use the function zeros to fill v with elements of the final size of v.
You know (or can compute) what the final size of v will be, based on the value of c. Based on your loops, it will be:
c^2*(c+1)^2/4
So instead of setting v as an empty array to start with, use
v = zeros(1,c^2*(c+1)^2/4);
Inside your loop, maintain a counter that tells you where to put the next element of the vector. So your loops will look like this now:
c=size(Mat,1);
v = zeros(1,c^2*(c+1)^2/4);
ind = 0;
for i = 1:c
for j =i:c
for d=1:c
for l =d:c
ind = ind + 1;
a = corrcoef(Mat(i,j,:),Mat(d,l,:));
v(ind) = a(2,1);
end
end
end
end
Could you have written those loops themselves more efficiently, with fewer loops? This is probably also true, but not the immediate problem.
In fact, the editor will actually have warned you of this need to preallocate v. On the right hand side, you will see red and orange lines in the editor, indicating potential problems in your code. Read what they say, and what they suggest.
Note that the very same comments apply to the array H. You never bothered to preallocate H to the proper size, but then you stuff one element at a time into H. Even worse, all you are really doing is a simple RESHAPE!!!!!! Thus, you are creating a new array that has the same elements, in the same order, but the array is a different shape. So you can replace the entire set of loops that create H with one line of efficient code.
H = reshape(v,c*(c+1)/2,c*(c+1)/2);
As I said in my answer, look to your own code for the problem. Writing c++ code to fix these problems would be silly, and a complete waste of time.
for the loops, I tried several times and couldn't see a better way to do it to obtain what I need. It turned out that most of the time was spent when calling corrcoef, I replaced it by another built function, it reduced time to more than half, so yes as you said :) you convinced me and thank you for the reshape (didn't know about it),

Sign in to comment.

Jan
Jan on 7 Apr 2017
Edited: Jan on 7 Apr 2017
After your comment:
Letting an array grow iteratively is a massive waste of resources. You should see a corresponding MLint warning in the editor.
c = size(Mat,1);
nv = (0.5 * c * (c + 1)) ^ 2; % sum(1:c)^2
v = zeros(1, nv);
iv = 0;
for i = 1:c
for j = i:c
Matij = Mat(i, j, :);
for d = 1:c
for l = d:c
a = corrcoef(Matij, Mat(d,l,:));
iv = iv + 1;
v(iv) = a(2,1);
end
end
end
end
For Mat = [20 x 20 x 10] this reduces the runtim from 5.3 to 4.0 sec already, but the benefit will be higher for larger inputs.
This can be accelerated, if you do not compute all elements by corrcoef, but only the needed one:
function r21 = mycorrel(A, B)
r = cov([A(:), B(:)]);
r21 = r(2,1) / sqrt(r(1) * r(4));
% Exactly the same output as CORCOEFF without rounding differences:
% Slightly slower, but considers over/underflow:
% r21 = r(2, 1) / sqrt(r(1)) / sqrt(r(4));
and
... innermost loop:
for l = d:c
iv = iv + 1;
v(iv) = mycorrel(Matij, Mat(d,l,:));
end
2.5 seconds run time. :-)
The 2nd part: nchoosek is slow, so do not call it repeatedly:
k = 1;
nc = (0.5 * c * (c + 1)); % nchoosek(c,2)+c;
H = zeros(nc, nc); % Pre-allocate!!!
for i = 1:nc
for j = 1:nc
H(i,j) = v(k);
k = k+1;
end
end
H(isnan(H)) = 0;
What do you observe finally?
H(1:5, 1:5)
H is symmetric. This means that half of the computations are redundant. Then you can reduce the run time by 50% if you consider this in the first part already. Now the main diagonal can be ignored also. I leave this up to you, it is not hard to create the matrix H directly. Calculate the elements below the diagonal only and copy it above the diagonal. Something like:
for d = i:c % Instead of 1:c
...
H(x, y) = mycorrel...
H(y, x) = H(x, y);
Conclusion: With some reordering and avoiding of unnecessary computations you can accelerate the code by a factor of 4 without the need to create a C-Mex function. Because the main work is spent in COV, a straight translation to C would not have been faster.

5 Comments

I never bother myself to read the orange warning (my mistake),for corrcoef, it is solved. Thank you for all your help, I appreciate it
This:
k=1;
for i = 1:nchoosek(c,2)+c
for j = 1:1:nchoosek(c,2)+c
H(i,j) = v(k);
k=k+1;
end
end
can be replaced by:
nc = (0.5 * c * (c + 1))
H = reshape(v, nc, nc);
But see my next answer, which creates H directly and uses the symmetry of the output.
Mayssa
Mayssa on 7 Apr 2017
Edited: Mayssa on 7 Apr 2017
@Jan: I think reshape does the job better for H as mentioned above, don't you think ? Indeed it is faster but my problem is still with the main which takes near half an hour (well better than 1 hour yesterday), that's why I thought of creating separate C functions in the first place; I have seen this in a clustering program, although a very much larger data than mine is used, the code doesn't take more than 1 min.
@Stephen: thank you for the link, greatly appreciated

Sign in to comment.

Jan
Jan on 7 Apr 2017
Edited: Jan on 7 Apr 2017
function H = MyCorrCoeff1(Mat)
c = size(Mat,1);
nc = (0.5 * c * (c + 1));
H = zeros(nc, nc); % Pre-allocate!!!
iH = 1;
jH = 1;
for i = 1:c
for j = i:c
Matij = Mat(i, j, :); % Tiny speedup to copy this once only
for d = 1:c
for l = d:c
if iH == jH % Diagonal element is 1.0
H(iH, jH) = 1;
elseif iH > jH % Use symmetry of H
% Equivalent:
% r = corrcoef(Mat(i,j,:),Mat(d,l,:)); r21 = r(2, 1);
Matdl = Mat(d,l,:);
r = cov([Matij(:), Matdl(:)]);
r21 = r(2,1) / sqrt(r(1)) / sqrt(r(4));
H(iH, jH) = r21;
H(jH, iH) = r21;
end
% Update indices related to H:
iH = iH + 1;
if iH > nc
iH = 1;
jH = jH + 1;
end
end
end
end
end
H(isnan(H)) = 0;
This reduces the runtime from 3.26 to 0.63 sec (R2016b/64, Win7).
Now COV is still the bottleneck, but is called less often. Let's see if inlining the code of COV helps (in addition H is initialized with 1 and the diagonal is not set inside the loops):
function H = MyCorrCoeff2(Mat)
c = size(Mat,1);
nc = (0.5 * c * (c + 1));
H = ones(nc, nc); % Pre-allocate!!!
covM = size(Mat, 3);
iH = 1;
jH = 1;
for i = 1:c
for j = i:c
Matij = Mat(i, j, :);
cov1 = Matij(:) - sum(Matij(:)) / covM; % Remove mean
cov1n = sqrt(cov1.' * cov1);
for d = 1:c
for l = d:c
if iH > jH % Use symmetry of H
Matdl = Mat(d,l,:);
% Emulate: r = cov([Matij(:), Matdl(:)]);
cov2 = Matdl(:) - sum(Matdl(:)) / covM;
r21 = (cov2.' * cov1) / cov1n / sqrt(cov2.' * cov2);
H(iH, jH) = r21;
H(jH, iH) = r21;
end
iH = iH + 1; % Update indices related to H:
if iH > nc
iH = 1;
jH = jH + 1;
end
end
end
end
end
H(isnan(H)) = 0;
Yepp, it is faster: 0.24 sec. 13.6 times faster, and still no need to struggle with tricky C code (and as you can see from my FileExchange submissions, I do love C!).
But look at the code: It is ugly! The original code was much easier to understand. Here a clean version:
c = size(Mat,1);
nc = (0.5 * c * (c + 1)); % sum(1:c)
H = zeros(nc, nc)
iH = 0;
for i = 1:c
for j = i:c
for d = 1:c
for l = d:c
a = corrcoef(Mat(i, j, :), Mat(d,l,:));
iH = iH + 1;
H(iH) = a(2,1);
end
end
end
end
The results are equal except for rounding differences. This nicer version should be included as comment.

Categories

Asked:

on 5 Apr 2017

Edited:

on 8 May 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!