There is no (good) answer to your question. And that answer will subtly change next year too, with the next release of MATLAB, or the next computer you buy. I'm sorry, but it cannot be the case, even for something as simple as cumsum. Sorry. But it gets complicated. Blame your computer.
Effectively, for small vector lengths, the time complexity of cumsum will be linear, so O(n), where n is the length of the vector. But for those small vectors, there will also be calling overhead. Error checks, setup, etc. The basic algorithm will be...
- Set up. Error checks. initialize everything.
- A simple loop. Just add the numbers sequentially.
So the total time taken will be
c1 + c2*n
where c1 is the total of the initial overhead times. c1 will be pretty much fixed of course. And for really small values of n, c1 will dominate the time required. c2 is just the time for a loop to process each element.
The problem is, when n grows large, your computer does things in a smart way. It realizes that for really big problems involving sums, etc., it can sometimes gain by using multiple processors. So I tried out cumsum, on some pretty big vectors. (I've got a 16 core machine, and a fair amount of RAM, so I could go pretty large in these tests.)
I was watching the number of cores my computer used in that, though my activity monitor was probably not sampling very finely. It did get at least more than 2 cores running though. So 9 seconds to do a cumsum on 3 billion element vectors.
Next, I cut the vector length in half, and in half again.
Hmm. do you see anything interesting? The time to go from 0.75e9 to 1.5e9 doubled. But to then double the vector length again, the time went way up. Not just double, but by a factor of roughly 6.
What happened? Most likely, I was getting hung up, due to something like a RAM limit, or maybe cache size. I never got near making all 16 cores fire up. That takes some serious effort. (Believe me, I've tried. It takes some work to get this beast of mine fully awake.)
But effectively, cumsum will have multiple domains in its run time.
For very small n, cumsum will be O(1). So constant run time, independent of the vector length.
For intermediate to moderately large n, cumsum will be O(n) in run time.
For HUGE n, cumsum will be something completely, unpredictably different!
For really small vector lengths, the setup dominates EVERYTHING. For really large vector lengths, it gets complicated. For MOST problems in the intermediate domain, you should expect linear, O(n) timing.
And the large scale run time behavior of cumsum will be completely different in completely different ways, depending on your computer. It will depend on the number of cores on your CPU. It will depend on the amount of RAM you have, cache sizes, bus sizes. This means we don't even know where the breaks will come, because your computer is different from mine.
And, NO. There are no documented references, papers, etc. on the time complexity of cumsum. It wil completely depend on your computer for the end cases, for both small and large values of n. For intermediate size problems, it will be O(n). But nobody is going to write a paper on the behavior of cumsum. Feel free, if you want to do so, but you will probably need to get a job at MathWorks to learn the exact code, and that will be important. You will need to learn the depths of tools like the BLAS, to learn when your computer will start to automatically use multiple cores, because these things are not determined by cumsum. And you will need to learn a great deal about how your computer interacts with its RAM and cache limits, how the system bus size will impact how fast data can be passed around internally. It is almost certainly not worth the effort, because every computer will be different, and the computers from next year will be different yet. But feel free to try.