How to compute correlation between estimated residuals and input data x?

6 views (last 30 days)
Dear community!
A (maybe) stupid question, but I really would appreciate any kind of help:
I want to to compute the correlation between the estimated residuals of my regression model and each of the factors of my input data x. Thus, I want to test the assumption that there is no correlation between the residuals and these input factors x.
Thanks a lot, maybe one of you has some idea or code available, would really much appreciate it!
/

Answers (1)

Raghava S N
Raghava S N on 26 Apr 2024
Hi,
I understand that you want to compute the correlation between the estimated residuals of your regression model and each of the factors in your input data (x), and to test the assumption that there is no correlation between the residuals and these input factors. For this purpose, MATLAB’s “corrcoeff” function can be used. You can follow these steps:
1. Run your regression model to obtain the residuals. Let us assume you have a dataset with multiple input factors (columns of “x”) and an output variable “y”. You would fit a regression model to “y” using “x” and obtain the residuals. Here's a simple linear regression example:
% Assuming “x” is your matrix of input factors and “y” is your output variable
mdl = fitlm(x, y); % Fits a linear regression model
residuals = mdl.Residuals.Raw; % Extracts the raw residuals from the model
2. Next, compute the correlation between these residuals and each factor in “x”:
% Initialize a vector to store correlation coefficients
correlations = zeros(1, size(x, 2));
pValues = zeros(1, size(x, 2)); % For storing probability values(p-values)
for i = 1:size(x, 2) % Loop through each factor in x
[R, P] = corrcoef(residuals, x(:, i)); % Compute Pearson's correlation
correlations(i) = R(1, 2); % Store the correlation coefficient
pValues(i) = P(1, 2); % Store the p-value
end
Here, “R” is the correlation coefficients, returned as a matrix.
3. To test the significance of the correlation, the “corrcoef” function also returns p-values (“P” in the code). A commonly used significance level is 0.05, where a p-value below this threshold suggests that the correlation is statistically significant, thus potentially rejecting the assumption of no correlation between the residuals and the input factor.
Correlations Close to 0: If the correlation coefficients are close to 0 and the p-values are above your significance threshold (0.05 is the default), it suggests that there is no significant correlation between the residuals and the input factors, supporting your assumption.
Significant Correlations: If any correlation coefficients are significantly different from 0 (based on p-values), it suggests that those input factors are correlated with the residuals, which might indicate a problem with the model, such as omitted variable bias or incorrect functional form.
Please find the MATLAB R2024a documentation links for the functions used in the code:
Hope this helps.

Categories

Find more on Descriptive Statistics in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!