all possible combinations of variables

1 view (last 30 days)
Vitaly
Vitaly on 10 Mar 2011
Hi,
I have to run regression equations given values of dependent variable (stored in matrix Y (size 40x1)), and values of independent variables stored in matrices X1, X2, X3 (all three matrices have size 40x200). I want to test which combination of variables from matrices X1, X2, and X3 do the best job explaining Y. To do that I want to pick 1 column of variables from matrix X1, 1 column from matrix X2, and one from X3 and regress Y on these 3 independent variable. Then I want to store values of R-squared in Vector R. After that I want to try another combination of X1,X2 and X3 (lets say X1 is still X1(:,1), X2 is X2(:,1) and X3(:,2)), run regression and store R-squared. Eventually I want to try all possible combinations of X1,X2, and X3. I also want R-squared, obtained after trying all possible combinations of X1,X2,X3 to be stored in vector R, so that I could later sort this vector and see which combination of Xs gives me the best fit between Y and X1,X2,X3. Is there also a way to see which combination of Xs gave the best R-squared? Can anybody suggest a good way to code this problem? I know how to do fit between Y and Xs as soon as combination of Xs is defined, but I have no idea how to set a loop to try all possible combinations of Xs, how to store values of R-squared in vector R, and how to assign names of variables or index numbers to each R-squared to see which combination of Xs gave particular R-squared value.
I know R-squared is not the best measure of goodness of fit, but as long as I have code to run this problem I can calculate any regression statistic from the regression output.
I will greatly appreciate your help.
Vitaly

Answers (2)

Matt Tearle
Matt Tearle on 11 Mar 2011
I don't think there's a simple function that will do anything like this, so I suspect you'll have to brute-force it:
R2 = 0;
for k1 = 1:200
x1 = X(:,k1);
for k2 = 1:200
x2 = X2(:,k2);
for k3 = 1:200
<do regression>
if newR2 > R2
<save current info>
end
<ends etc>

Oleg Komarov
Oleg Komarov on 11 Mar 2011
200^3 = 8e6 regressions...
tic
comb = combinator(200,3,'p','r');
X1 = rand(40,200);
X2 = rand(40,200);
X3 = rand(40,200);
Y = rand(40,1);
in = ones(40,1);
R = zeros(size(comb,1),1);
for c = 1:size(comb,1)
X = [in X1(:,comb(c,1)) X2(:,comb(c,2)) X3(:,comb(c,3))];
yhat = X*(X\Y);
ybar = mean(Y);
ssr = norm(yhat - ybar)^2;
sst = norm(Y - ybar)^2;
R(c) = ssr/sst;
end
toc
[pos pos] = max(R);
sprintf('Max R: %5.4f\ncolX1: %3d\ncolX2: %3d\ncolX3: %3d',...
R(pos), comb(pos,:))
It took 5 mins to execute (Vista32 R2010b IntelCoreDUO 2.5GHz)
You can find combinator on the FEX.
Oleg

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!