Why is m-function overhead 100 times more than built-in overhead.

function Ignore(a)
end
Then run the following:
tic;
for i = 1 : 200000
Ignore(A);
end
toc;
Elapsed time is 0.945148 seconds. I've found on my system Matlab can do about 200,000 empty function calls per second. So function call overhead is about 5 microseconds.
A = rand(100000, 1);
tic;
for i = 1 : 200000
b = numel(A);
end
toc;
Elapsed time is 0.011971 seconds. So here the number of calls per second is around 20,000,000. This is almost 100 times faster even with a numel() lookup.
I'm curious as to why the m-function overhead is so high. By the way I've also tried putting this code in an m-file and running it. The result is the same.
I've found that with these m-function timings. it is possible for code in a loop to be swamped by overhead time. For example calling a function 10,000,000 times - the function calls 10 other functions which in turn call an average of 5 functions say. This gives 500,000,000 function calls - which would take about 2500 seconds or 40 minutes. If on average each function does only 1 microsecond of work (this is not implausible at all), the actual computation time is only 500 seconds.
Thus: 8 minutes computation + 40 minutes call overhead. Just to heighten the sense of the ridiculous, with a really massive computation, this could translate to an 8 hours computation taking two days.
Could anyone comment on the reason for the high overhead, what to do to mitigate this problem or any other useful information related to this.
An unpleasant surprise in my code (the second in a few days) prompted me to write this. This was the first unpleasant surprise.
The second is the nnz() function. For so long I had always implicitly assumed that it had constant time complexity. I just realized I was wrong and that it has constant time only for sparse matrices.
If nnz() were constant time (Matlab arrays already store other useful information), it would also mean that all(A) and any(A) would be constant time and considering how often these are used, it would be a significant improvement.
Thank you.
[Second loop was edited for typos.]

10 Comments

Elapsed time is 0.011971 seconds. So here the number of calls per second is around 20,000,00. This is almost 100 times faster even with a numel() lookup.
Maybe I'm missing something, but I don't see how you get that number. Your loop iterates 1000 times, for a total of 0.011971 seconds. Calls per second should therefore be 1000/0.011971 = 83535. Also, why the call to rand() inside the tic...toc?
Oops, the rand() should not be there and the loop limit is wrong. It was a typo. I have edited the post.
The actual code ran without the rand() so the timing is still correct. Sorry for the confusion.
Regards
I can really only speculate about the difference. Presumably the JIT can do optimization on built-in functions in for-loops that it cannot do with user-supplied Mcode. But I wouldn't know the underlying reason for that.
In any case, calling functions repeatedly in really long loops, whether mfunctions or builtin functions, has always been contrary to MATLAB programming philosophy. That's why one needs to vectorize.
If nnz() were constant time (Matlab arrays already store other useful information), it would also mean that all(A) and any(A) would be constant time and considering how often these are used, it would be a significant improvement.
That would add significant overhead to linear algebra and other operations, since an implicit nnz() would have to be called every time a new matrix was generated. In the matrix multiplication example below, this increases execution time by about 50%,
A=rand(5000);
tic; B=A.*A;toc;
Elapsed time is 0.058789 seconds.
tic; nnz(B); toc
Elapsed time is 0.029959 seconds.
You are right about that (nnz()). I was thinking in terms of caching the value whenever possible. For example right after a matrix of zeros is generated. Similarly, things like all(), any() "is_symmetric()" could be easily cached when they are known without extra calculation - each of these properties require only two bits of storage ("yes", "no" or "dontknow"). In other words, don't do anything too special to calculate the value or keep track of it but cache it when it is cheap to do so. Then after any modifying operation set the bit to "dontknow" if so required. In addition certain operations such as A(i) = x are already expensive enough (why? I don't know) that adding a small check after each assignment would barely make any difference.
Of course you are right about Matlab being built for vectorization, but it would be interesting to know whether that would be the "official" answer given by Mathworks. There is lots of code that simply cannot be vectorized - perhaps even the majority. Of course one can use mex files and I have done that in C++ when required. However there are some issues with this. So please bear with me until I finish blabbering below.
Now, supposedly Matlab is a "prototyping" language - this is the "official" description I read somewhere. In my understanding, this means that you write code quickly, debug it and then port parts of it to say C or C++ (or any other language of you choice). From my limited experience, what happens in practice is that you write a ton of Matlab code (using cell arrays and the works - since you didn't know any better when you started). Then you find that certain things are impossibly slow so you run the profiler to find the bottlenecks. Many bottlenecks are resolved within Matlab itself using more efficient constructs, algorithms, parallelization, but there is a small stubborn percentage. So you say, OK no problem, I'm going to write a few mex files and that should do the trick.
You start writing the mex files. You realize a few things. Firstly the mex API is fairly straightforward but a bit sparse. So you need to write a shallow wrapper to make things a bit convenient. Secondly, you realize that if you want to, say, multiply two matrices, you'll have to call Matlabs mmultiply from within the mex files. What about other less heavyweight operations - for example plus, multiply (elementwise), etc? You'll have to call out for these from within the mex file also. So you think, no problem, I'll just write out the matrix operations in C++ - it should be simple. You do that and you find that your plain C++ matrix ops are much slower than Matlab's. So you optimize a bit, and much to your surprise your O(n^2) operations (memory bound operations) end up several times faster than Matlab's O(n^2) operations. Encouraged you try to speed up your C++ multiplication algorithm to be competitive with Matlab. You succeed only partially. And then there are a couple of other algorithms like qr() for example, where Matlab is really really fast.
Anyway, by now you've built up a small library of functions - several times faster than the Matlab versions, but you need a lot more - in fact all the convenient operations that you used in Matlab without a second thought. You know you can implement them - but its going to take time. In addition there are holdouts like matrix multiply, qr, cholesky factorization, Gaussian elimination and a few others that are difficult to beat.
Ultimately you realize that:
1. There are just a few hard operations that you really need Matlab for.
2. There are lots of small operations that you have to implement because you don't want the overhead of calling out for them from within the mex file, and also because you can probably do them equally, if not more, efficiently.
3. However there is no time to implement all these small operations.
So you go back to your Matlab code and try to do your best. And thats where things like nnz(), all(), etc among a host of other issues comes in - including efficiency of non-vectorized code. This is when you wonder whether you should be "Programming" in Matlab or whether you should just be doing quick numerical experiments in it.
Perhaps the scientific users of Matlab have already figured this out instinctively, and that is why the typical m-file they write tends to look like ..., well the less said the better. Then you wonder, "Am I the only stupid one who thinks that it is actually a programming language? Should I at all be dignifying my Matlab scriblings by calling it "Source Code" as some others do?"
Sorry for the overly long "comment". Thank you for reading.
There is lots of code that simply cannot be vectorized - perhaps even the majority.
There is some code that cannot be vectorized, but if that were the majority, I tend to think TWM would have gone out of business ages ago. If you have a routine which is not obviously vectorizable, maybe you should post it and draw upon more minds to see how it could be attacked.
I am not in the numerical software business, but to me it seems like TMW is in business because of the large set of toolboxes that it offers, Simulink and what have you. Of course the toolboxes are passably fast because of the partial vectorization that is possible. But even if they were horrendously slow, they would still sell since there isn't a comparable alternative. I think it has little to to with the vectorization or any special core capability of Matlab.
For example, as far as I know people write sellable GUI's in Matlab even though Matlab's graphics are utterly sluggish. So slowness is not preventing people from doing business.
Regarding the routines, note that any routine that requires successive iterations is typically impossible to vectorize because of the dependence between succesive iterations. The content of each iteration can be vectorized or parallelized but to get the full benefit, all the parts that go into making up the succession of iterations have to be fast. So you have to pull the entire routine with all its loops and dependencies out from matlab and try to mexify it.
Well, okay, but then it's still unclear why you call it a "surprise" that mfunctions are slower than builtins. If they weren't, would you ever need mex files? Or if you still did need mex files, what problem is solved once mfunctions catch up to builtins in speed?
Thee "surprise" was that m-function call overhead was so high. I compared it to built-in to highlight how slow it is. Perhaps if I had an idea of what goes into an m-function call, I may have phrased it differently.
The function call overhead in C or C++ with say a single argument is a few cpu instructions, typically taking a couple of nanoseconds. I believe measuring function call overhead in Java is complicated due to various run-time optimizations applied by the JVM, but see for example this link. How does that square with overhead of builtin's in Matlab (around 50 nanoseconds on my machine) and m-function overhead (around 5 microseconds on my machine).
Also mex function calls.
tic;
for i = 1 : 200000
MexOverhead(); % Empty mex function
end
toc;
Elapsed time is 0.452725 seconds.
Calling a matlab function (built-in or m-file) from mex is even more expensive (maybe around 100 times more).

Sign in to comment.

Answers (0)

Asked:

SK
on 19 Sep 2014

Commented:

SK
on 22 Sep 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!