all possible combinations of variables
1 view (last 30 days)
Show older comments
Hi,
I have to run regression equations given values of dependent variable (stored in matrix Y (size 40x1)), and values of independent variables stored in matrices X1, X2, X3 (all three matrices have size 40x200). I want to test which combination of variables from matrices X1, X2, and X3 do the best job explaining Y. To do that I want to pick 1 column of variables from matrix X1, 1 column from matrix X2, and one from X3 and regress Y on these 3 independent variable. Then I want to store values of R-squared in Vector R. After that I want to try another combination of X1,X2 and X3 (lets say X1 is still X1(:,1), X2 is X2(:,1) and X3(:,2)), run regression and store R-squared. Eventually I want to try all possible combinations of X1,X2, and X3. I also want R-squared, obtained after trying all possible combinations of X1,X2,X3 to be stored in vector R, so that I could later sort this vector and see which combination of Xs gives me the best fit between Y and X1,X2,X3. Is there also a way to see which combination of Xs gave the best R-squared? Can anybody suggest a good way to code this problem? I know how to do fit between Y and Xs as soon as combination of Xs is defined, but I have no idea how to set a loop to try all possible combinations of Xs, how to store values of R-squared in vector R, and how to assign names of variables or index numbers to each R-squared to see which combination of Xs gave particular R-squared value.
I know R-squared is not the best measure of goodness of fit, but as long as I have code to run this problem I can calculate any regression statistic from the regression output.
I will greatly appreciate your help.
Vitaly
0 Comments
Answers (2)
Matt Tearle
on 11 Mar 2011
I don't think there's a simple function that will do anything like this, so I suspect you'll have to brute-force it:
R2 = 0;
for k1 = 1:200
x1 = X(:,k1);
for k2 = 1:200
x2 = X2(:,k2);
for k3 = 1:200
<do regression>
if newR2 > R2
<save current info>
end
<ends etc>
0 Comments
Oleg Komarov
on 11 Mar 2011
200^3 = 8e6 regressions...
tic
comb = combinator(200,3,'p','r');
X1 = rand(40,200);
X2 = rand(40,200);
X3 = rand(40,200);
Y = rand(40,1);
in = ones(40,1);
R = zeros(size(comb,1),1);
for c = 1:size(comb,1)
X = [in X1(:,comb(c,1)) X2(:,comb(c,2)) X3(:,comb(c,3))];
yhat = X*(X\Y);
ybar = mean(Y);
ssr = norm(yhat - ybar)^2;
sst = norm(Y - ybar)^2;
R(c) = ssr/sst;
end
toc
[pos pos] = max(R);
sprintf('Max R: %5.4f\ncolX1: %3d\ncolX2: %3d\ncolX3: %3d',...
R(pos), comb(pos,:))
It took 5 mins to execute (Vista32 R2010b IntelCoreDUO 2.5GHz)
Oleg
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!