Fastest way of computing standard errors / inverting X'X

Hello all,
I'm running a Monte Carlo study and need to evaluate many linear regressions y = X b + u very efficiently. Specifically, I need to estimate a regression and compute the standard errors of the estimates many times. So speed is very important, while accuracy is not too much so. Evidently, Matlab's built-in functions such as lscov and regress take a fair amount of time to run. Hence, I wrote a little function to do these tasks myself.
However, as I need to calculate the standard errors, the function is still quite slow. I use inv(X' * X) to get the inverse, as it is used in the calculation of the standard errors (and it is evidently faster than X' * X \ eye(size(X, 2))). Is there a faster way of doing this, i.e. by some smart factorizations? I saw a suggestion to use a QR decomposition, and then use inv( R ) for calculating inv(X'* X) but this is even slower.
All the help is greatly appreciated!

 Accepted Answer

You can use symbolic math to find the inverse of a symmetric 3x3 matrix:
syms a b c d e f real
M = diag(inv([a b c; b d e; c e f]))
And then use that expression directly. I find that this is several times faster than calling INV.
X = rand(100,3);
XX = X'*X;
denom = (XX(9)*XX(2)^2 - 2*XX(2)*XX(3)*XX(6) + XX(5)*XX(3)^2 + XX(1)*XX(6)^2 - XX(1)*XX(5)*XX(9));
invXX = [(XX(6)^2 - XX(5)*XX(9))/denom;
(XX(3)^2 - XX(1)*XX(9))/denom;
(XX(2)^2 - XX(1)*XX(5))/denom]
You can confirm that this is the same as diag(inv(X'*X)).

1 Comment

Hey! Thanks so much for the answer. This indeed is faster, although not by several times, at least on my machine. It may be interesting to note that for finding the estimates it still seems better to calculate X \ y, instead of using the whole matrix (XX)^{-1} calculated as above, if I did not do any mistakes.

Sign in to comment.

More Answers (1)

inv is evil -- avoid! The quickest way to do a regression is
c = X\y;
then you can calculate errors yourself
resid = X*c - y;
% etc
but are you doing this numerous times for different y but the same X? If so, maybe an LU decomposition would help.

10 Comments

Hey Matt;
Thanks! I know this. :) The thing is that I need to get standard errors, the standard formula for which is SE = sigmahat / (N - K) * diag(inv(X' * X)), where N = size(X, 1), K = size(X, 2) and sigmahat is the estimate of the variance. So I need the inverse of X' * X, or to avoid calculating it in some smart way.
Oh, sorry. I need to read more carefully. I was thinking of the standard error of the regression, rather than the errors of the coefficients.
How big is X? (And I assume it's changing each time?) Unless k is large, I don't think the inverse calculation is actually going to consume that much time. If N is large, the calculation of X'*X may be the time sink. Have you tried running it in the Profiler?
X is not too big, at most it's 100 x 3. It has no special structure, as far as I can say. I had tried to profile my function, and it seems that ~25% of the time is spent on the matrix inversion alone. The rest is spent on doing some simple operations which I have already streamlined somewhat, e.g. by using norm to calculate sum of squared residuals. Also, it seems that inv(A) * B is more inefficient that A \ B, even if inv(A) is precalculated.
But have you separated out the calculations of X'*X and inv(...)? Not that it matters much unless there's some other way to calculate the standard errors without X'*X...
Long shot, but... if it's only 3 variables, is there any chance that there are simple, closed form formulae for the SEs, that don't involve having to go through X'X? I seem to recall you can rewrite the SE formulae for the slope/intercept of a simple linear regression. But I don't recall if it actually saves on any sum-of-products calculations. I don't remember the theory to know if you actually need all the "cross term" dot products. Probably :(
Yes, I have separated them out... :) I have found a nice way how to do that in R: http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf, see in p. 3 top. However, I cannot really reproduce it in Matlab as it uses chol2inv.
Well, don't keep me in suspense -- what was the bottleneck?
Anyway, riffing off the approach in that paper, it seems like
[~,r] = qr(x);
rinv = r(1:3,:)\eye(3);
vcov2 = rinv*rinv';
works a bit faster than inv(x'*x). But you indicated in your initial question that this was slower for you. Odd.
Hey Matt! Thanks so much for your kind help. It has been enormously helpful. :)
The bottleneck was the inv() function. Your code indeed seems to work a little faster, although I think it does not calculate quite what it is supposed to. (I initially used a different way to calculate var-cov, by doing a QR on X'X, and this was not terribly fast.)
Anyway, here a little fix on your code on how to do this (using Hungarian notation):
[mQ, mR] = qr(mX);
mA = mR' * mR \ eye(iK);
Then, if one extracts diag(mA) and multiplies the estimated variance of the error terms, one has the covariance matrix of the estimates. Also, one could use the QR factorization to compute the estimates of the coefficients as mZ = mQ' * vY; vBetaHat = mR \ mZ. I'm not yet sure if that is faster than just using mX \ vY.
Anyway, thanks again!
Forgot to add that iK = size(mX, 2). :)
When I test it, I get the same result for rinv*rinv' as inv(x'*x), to within roundoff. (Which is what should happen, from theory.) I thought extracting the non-zero rows of r might help dispose of a few redundant calculations. But if inv() is the bottleneck... Maybe bruteforce inverse calculation would help? This seems to be a tiny bit faster than inv:
[~,r] = qr(x);
rinv = diag(1./diag(r));
rinv(2,3) = -r(2,3)*rinv(3,3)/r(2,2);
rinv(1,2) = -r(1,2)*rinv(2,2)/r(1,1);
rinv(1,3) = (-r(1,2)*rinv(2,3)-r(1,3)*rinv(3,3))/r(1,1);
vcov = rinv*rinv';
But, after all that, I must have been doing something wrong before, because I'm now getting that inv(x'*x) is faster after all.
Hey! Thanks again for this answer. At the end I opted for Teja's way of calculation, as it seems faster for the smaller problem I have. For bigger problems, though, I think something like that should be preferred. :) Thanks!!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!