Fastest way of computing standard errors / inverting X'X
Show older comments
Hello all,
I'm running a Monte Carlo study and need to evaluate many linear regressions y = X b + u very efficiently. Specifically, I need to estimate a regression and compute the standard errors of the estimates many times. So speed is very important, while accuracy is not too much so. Evidently, Matlab's built-in functions such as lscov and regress take a fair amount of time to run. Hence, I wrote a little function to do these tasks myself.
However, as I need to calculate the standard errors, the function is still quite slow. I use inv(X' * X) to get the inverse, as it is used in the calculation of the standard errors (and it is evidently faster than X' * X \ eye(size(X, 2))). Is there a faster way of doing this, i.e. by some smart factorizations? I saw a suggestion to use a QR decomposition, and then use inv( R ) for calculating inv(X'* X) but this is even slower.
All the help is greatly appreciated!
Accepted Answer
More Answers (1)
Matt Tearle
on 3 Jan 2012
inv is evil -- avoid! The quickest way to do a regression is
c = X\y;
then you can calculate errors yourself
resid = X*c - y;
% etc
but are you doing this numerous times for different y but the same X? If so, maybe an LU decomposition would help.
10 Comments
mickey
on 3 Jan 2012
Matt Tearle
on 3 Jan 2012
Oh, sorry. I need to read more carefully. I was thinking of the standard error of the regression, rather than the errors of the coefficients.
How big is X? (And I assume it's changing each time?) Unless k is large, I don't think the inverse calculation is actually going to consume that much time. If N is large, the calculation of X'*X may be the time sink. Have you tried running it in the Profiler?
mickey
on 3 Jan 2012
Matt Tearle
on 3 Jan 2012
But have you separated out the calculations of X'*X and inv(...)? Not that it matters much unless there's some other way to calculate the standard errors without X'*X...
Long shot, but... if it's only 3 variables, is there any chance that there are simple, closed form formulae for the SEs, that don't involve having to go through X'X? I seem to recall you can rewrite the SE formulae for the slope/intercept of a simple linear regression. But I don't recall if it actually saves on any sum-of-products calculations. I don't remember the theory to know if you actually need all the "cross term" dot products. Probably :(
mickey
on 3 Jan 2012
Matt Tearle
on 3 Jan 2012
Well, don't keep me in suspense -- what was the bottleneck?
Anyway, riffing off the approach in that paper, it seems like
[~,r] = qr(x);
rinv = r(1:3,:)\eye(3);
vcov2 = rinv*rinv';
works a bit faster than inv(x'*x). But you indicated in your initial question that this was slower for you. Odd.
mickey
on 3 Jan 2012
mickey
on 3 Jan 2012
Matt Tearle
on 3 Jan 2012
When I test it, I get the same result for rinv*rinv' as inv(x'*x), to within roundoff. (Which is what should happen, from theory.) I thought extracting the non-zero rows of r might help dispose of a few redundant calculations. But if inv() is the bottleneck... Maybe bruteforce inverse calculation would help? This seems to be a tiny bit faster than inv:
[~,r] = qr(x);
rinv = diag(1./diag(r));
rinv(2,3) = -r(2,3)*rinv(3,3)/r(2,2);
rinv(1,2) = -r(1,2)*rinv(2,2)/r(1,1);
rinv(1,3) = (-r(1,2)*rinv(2,3)-r(1,3)*rinv(3,3))/r(1,1);
vcov = rinv*rinv';
But, after all that, I must have been doing something wrong before, because I'm now getting that inv(x'*x) is faster after all.
mickey
on 4 Jan 2012
Categories
Find more on Noncentral t Distribution in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!