Residual values for a linear regression fit

I have these points
x = [1,1,2,2,3,4,4,6]';
y = [8,1,1,2,2,3,4,1]';
I want to remove the point from above set that makes the residual largest.
This is the code I use
d=zeros(length(x),1);
for i=1:length(x)
x_bk = x;
y_bk = y;
x(i) = [];
y(i) = [];
X = [ones(length(x),1) x];
b = X\y;
yhat = X*b;
d(i) = abs(sum(y - yhat));
x = x_bk;
y = y_bk;
end
index = find(min(d)==d);
x(index) = [];
y(index) = [];
X = [ones(length(x),1) x];
b = X\y;
yhat_r = X*b;
plot(x,y,'o')
hold on
plot(x,yhat_r,'--')
I think the result should be black line (attached file), but I get red dashed line.

 Accepted Answer

I would do something like this:
x = [1,1,2,2,3,4,4,6]';
y = [8,1,1,2,2,3,4,1]';
xv = x;
yv = y;
for k = 1:numel(x)
X = [xv(:), ones(size(xv(:)))];
b = X \ yv(:);
yhat = X*b;
rsdn(k) = norm(yv - X*b);
xv = x;
yv = y;
xv(k) = [];
yv(k) = [];
end
figure
plot((1:numel(x)), rsdn)
grid
[rsdnmin,idxn] = min(rsdn(2:end));
[rsdnmax,idxx] = max(rsdn(2:end));
lowest = idxn+1
hihest = idxx+1
idxv = [lowest; hihest];
figure
for k = 1:2
subplot(2,1,k)
xv = x;
yv = y;
xv(idxv(k)) = [];
yv(idxv(k)) = [];
plot(xv,yv,'ob')
yhat = [xv(:), ones(size(xv(:)))]*bmtx(:,idxv(k));
hold on
plot(xv, yhat, '--r')
hold off
title(sprintf('Eliminating Set %d', idxv(k)))
end
Here, the norm of residuals (the usual metric) is least when eliminating ‘row=2’, and greatest when eliminating ‘row=6’.
Experiment to get the result you want.

6 Comments

Thank you for your time. But, I am not following what bmtx is.
My pleasure!
It is the matrix of the ‘b’ parameters. (That somehow got left out of the code I posted. I proofread these before I post them. I have no idea how I missed that.)
The correct first loop:
for k = 1:numel(x)
X = [xv(:), ones(size(xv(:)))];
b = X \ yv(:);
yhat = X*b;
rsdn(k) = norm(yv - X*b);
xv = x;
yv = y;
xv(k) = [];
yv(k) = [];
bmtx(:,k) = b;
end
Clarifying the plots and second loop:
figure
plot((0:numel(x)-1), rsdn)
grid
xlabel('Deleted Row')
ylabel('Residual Norm')
[rsdnmin,idxn] = min(rsdn(2:end));
[rsdnmax,idxx] = max(rsdn(2:end));
lowest = idxn
hihest = idxx
idxv = [lowest; hihest];
so those now make sense.
Thank you. I think the last point (6,1) was not removed in the first loop for the residual calculation. So, i should be until numel(x)+1.
for k = 1:numel(x)+1
end
As always, my pleasure!
I agree. However the loop (and the first plot) also need to be tweaked:
for k = 1:numel(x)+1
xv = x;
yv = y;
if k > 1
xv(k-1) = []
yv(k-1) = []
end
X = [xv(:), ones(size(xv(:)))];
b = X \ yv(:);
yhat = X*b;
rsdn(k) = norm(yv - X*b);
bmtx(:,k) = b;
end
figure
plot((0:numel(x)), rsdn)
grid
xlabel('Deleted Row')
ylabel('Residual Norm')
The first results are those for the entire set, with the following eight the results of deleting each paired element in turn. The rest of the code is unchanged.
I want to show that if I remove only one set of data the regression line changes a lot. (But I do not know, this is practically true or not).
For this reason, I make this set:
a0 = 4.5882;
a1 = 0.2353;
x = (0:1:8)';
y = a0+a1*x+randn(size(x));
But, it does not show any difference (please see the attachment). I think the way of producing data set is not correct.
In that simulation, you are defining a particular slope and intercept and adding a normally-distributed random vector to it. The slopes and intercepts of the fitted lines will not change much.
You can see that most easily if you add this text call to each plot (in the loop):
text(1.1*min(xlim),0.9*max(ylim), sprintf('Y = %.3f\\cdotX%+.3f',bmtx(:,k)), 'HorizontalAlignment','left')
That will print the regression equation in the upper-left corner of each one. You can then compare them.
Note that the residual norms do not change much, either. In the original data set, they varied between 2.73 and 5.97. In this data set, they are within about ±0.5 of each other.

Sign in to comment.

More Answers (0)

Categories

Asked:

NA
on 16 Oct 2020

Commented:

on 17 Oct 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!