Why fitlm function is giving wierd results?

I am using following code
% PCA
[coeff, score, ~, ~, explained] = pca(X);
X_pca = score(:, 1:10);
% Split data
cv = cvpartition(size(X_pca, 1), 'HoldOut', 0.2);
idxTrain = training(cv);
idxTest = test(cv);
X_train = X_pca(idxTrain, :);
X_test = X_pca(idxTest, :);
Y_train = Y(idxTrain);
Y_test = Y(idxTest);
reg = fitlm(X_train, Y_train);
However, the rusults fitlm are coming wierd. Please suggest me how to get correct results.
Deva

7 Comments

@Devendra - when you edit your question away, you make the answer meaningless, and no longer having any context. That hurts the site, as the answer no longer has any value. You insult the person who wasted their time helping you. And it reduces the chances that others (and certainly that person) will be willing to help you in the future.
As @John D'Errico mentioned, Even if you got your issue resolved, there is no loss for keeping the question as it is without editing, as if someone else faces similar issue they will have reference check the particular answer, deleting or editing them will make no sense to them, so consider not to edit or delete the question after posting. Thanks.
Devendra
Devendra on 13 Apr 2024
Edited: Devendra on 14 Apr 2024
@Manikanta Aditya
I am very sorry for my mistake. It will not happen again. Actually I deleted the post before realizing that already my friends have commented on it. My real intention was not to waste time of my friends on this issue since the problem has been resolved. I have restored the deleted post.
Devendra
@Aditya I hope this post finds you well. I am new to matlab community and therefore I want to seek your guidence regarding prediction of time series using the following code
mdl = fitlm(scoreTrain95,Y_train,'y ~ x1*x2*x3-x1:x2:x3');
Y_pred = predict(mdl,scoreTest95);
Here scoreTrain95 and scoreTest95 are available as 80% and 20% of data including both dependent and independent variables.
Now since I have trained the model from past five years data and now I wish to use these regression coefficients with next years independent variables to predict the dependent variable for the next year. I request you to kindly suggest me how to do it using matlab code.
Devendra
You can use the 'predict' function in MATLAB to predict the dependent variable for the next year using the regression coefficients from your model. You just need to ensure that the independent variables for the next year are in the same format as your scoreTest95 data.
% Train the model
mdl = fitlm(scoreTrain95, Y_train, 'y ~ x1*x2*x3-x1:x2:x3');
% Obtain the independent variable data for the next year
nextYearData = [...];
% Create a dataset array from nextYearData
nextYearDataset = array2table(nextYearData, 'VariableNames', {'x1', 'x2', 'x3'});
% Predict the dependent variable for the next year
nextYearPredictions = predict(mdl, nextYearDataset);
Hope it helps.
Thanks a lot for your kind guidance. Certainly it has helped me to understand the basics of prediction of data.🙏🙏 Devendra
Thank you! Good to know.

Sign in to comment.

 Accepted Answer

Hope you are doing great!
The error message you’re seeing is because the predict function is expecting an input with the same number of columns as the original data used to train the model. In your case, the model was trained with scoreTrain which has more than 3 columns, but you’re trying to predict with scoreTest which only has 3 columns (the principal components).
The issue arises from this line of code:
scoreTest = (X_test - mu)*coeff(:,1:idx)
Here, you’re reducing the dimensionality of your test set to 3 principal components, but your model was trained on the full set of principal components in scoreTrain.
To fix this, you should also limit the number of principal components in scoreTrain to 3. Here’s how you can do it:
scoreTrain = scoreTrain(:,1:idx);
reg = fitlm(scoreTrain, Y_train,'y ~ x1*x2*x3-x1:x2:x3');
Now, scoreTrain and scoreTest have the same number of columns, and you should be able to use the predict function without errors. Remember, the dimensions of the input for training and prediction must always match.
I hope this helps, let me know.

4 Comments

Here should the entire code, if you find it helpful, do accept the answer:
% Split data
cv = cvpartition(size(X, 1), 'HoldOut', 0.2);
idxTrain = training(cv);
idxTest = test(cv);
X_train = X(idxTrain, :);
X_test = X(idxTest, :);
Y_train = Y(idxTrain);
Y_test = Y(idxTest);
% PCA
[coeff, scoreTrain, ~, ~, explained, mu] = pca(X_train);
idx = 3; % keep 3 principal components
% Limit the number of principal components in scoreTrain
scoreTrain = scoreTrain(:,1:idx);
% Apply PCA transformation to test data
scoreTest = (X_test - mu)*coeff(:,1:idx);
% Train the model
reg = fitlm(scoreTrain, Y_train,'y ~ x1*x2*x3-x1:x2:x3');
% Predict
Y_pred = predict(reg, scoreTest);
Hello Aditya,
One different query.
How to print act_yield_ha over a plot? I mean how to represent under score sign using some matlab symbol or function etc so that it actually prints the same variable as act_yield_ha in place of distorted variable. Please suggest. Thanks for your valuable support.
Devendra
Because you're deleting your posts, you will probably have difficulty finding people to want to help you anymore.
I am very sorry for my mistake. It will not happen again. Actually I deleted the post before realizing that already my friends have commented on it. My real intention was not to waste time of my friends on this issue since the problem is resolved. However it will not happen again in future. Devendra

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!