Multivariate GLMFIT and GLMVAL

5 views (last 30 days)
I have a problem with plotting the results of a glm model with several predictors.
I first fit the model with GLMFIT as:
[b,dev,stats]= glmfit(x, y, distr);
where 'x' is a matrix of N observations by M predictors, and 'y' a vector of observed responses.
Then I produce predictions as:
xpred= zeros(100,size(x,2));
for k= 1:size(x,2)
xpred(:,k)= linspace(min(x(:,k)),max(x(:,k)),100);
end
ypred= glmval(b,xpred,link,stats);
Then I want to plot the effect of each individual predictor on the response variable as:
for j= 1:size(x,2)
figure(j);
plot(x(:,j),y,'k.',xpred(:,j),ypred,'b-','LineWidth',1)
end
But the line (predicted values) for most of the predictors goes totally out of the cloud of observed values. Am I doing something totally wrong or is just that the fit is poor? Any help would be much appreciated. Thanks

Accepted Answer

Tom Lane
Tom Lane on 24 Jan 2012
Consider the code below. It fits a multi-predictor model, but plots the fit as a function of one predictor at a time, with the others held fixed at their mean values. In contrast, your code has each predictor increase along with the others.
This has its own issues. For one thing, the superimposed points represent data, and may or may not be comparable to the line that constrains predictors to their means. But it is one of the things you can do to visualize a multi-predictor fit as a function of one variable at a time.
x = mvnrnd([0 0],[1 .9;.9 1],200);
y = 5*x(:,1) + 5*x(:,2) + randn(200,1);
[b,dev,stats] = glmfit(x,y,'normal');
% Create an array with all predictors at their mean
meanx = mean(x);
xpred = repmat(meanx,100,1);
xplot = xpred;
ypred = zeros(100,size(x,2));
for k= 1:size(x,2)
% Allow the kth predictor to vary, others stay fixed
xplot(:,k)= linspace(min(x(:,k)),max(x(:,k)),100);
xpred(:,k) = xplot(:,k);
ypred(:,k) = glmval(b,xpred,'identity',stats);
xpred(:,k) = meanx(k); % put the mean back
end
for j= 1:size(x,2)
figure(j);
plot(x(:,j),y,'k.',xplot(:,j),ypred(:,k),'b-','LineWidth',1)
end
If you run this you'll see that the lines don't match the points very well. That's because I generated the x values with correlation 0.9, meaning that it's unlikely that one predictor will stay fixed while the other increases. If you adjust 0.9 down toward 0.0, you should see better agreement between the fit and the data points.

More Answers (2)

Peter Perkins
Peter Perkins on 19 Jan 2012
Francisco, it's pretty hard to say what's happening without knowing anything about the data or the model, but this may be an artifact of the way you've created xpred. Notice that the first row of xpred has the min values for all of the predictors, and the last row has the max values. You've created what is essentially a straight line through the design variable space that may or may not have anything to do with the actual cloud of points in your data. That could explain why ypred doesn't look much like y.
Hope this helps.

Francisco de Castro
Francisco de Castro on 20 Jan 2012
Peter, thanks for your comment. However, to plot the predicted values of the model I need the values of each predictor to span its observed values, so the line of predicted (in the x axis) can go through the cloud of observed vals. That's why I use 100 values from min(x(:,j)) to max(x(:,j)) for each.
A friend of mine suggested that the problem is that I need to re-fit a single-predictor model for each plot. Like this:
for j= 1:size(x,2)
XPRED= linspace(min(x(:,j)),max(x(:,j)),100);
[b,dev,stats]= glmfit(XPRED, y, distr);
ypred= glmval(b,XPRED,link,stats);
figure(j); hold on
plot(x(:,j),y,'k.',XPRED,ypred,'b-','LineWidth',1)
end
This works, of course, however, then each plot represents a different 1-predictor model, not the contribution of each predictor to the multi-predictor model.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!