Formatting input data for linear regression model in leave-out-one validation testing

8 views (last 30 days)
Hello there I have data from 10 trials stored in a 10x1 cell (Predictors) and the corespoding respose vairables stored in a 10x1 cell (Response). I am trying to trian a simple linear regression model and make predictions by leaving one trial out and using the other 9 trials to train the linear regression model and the one to predict/test the model by producing RMSE values. I am unsure of how to format my input within the "fitlm" function as I keep getting the follwing error:
% Train the network
for i = 1:length(Predictors) %iterate over all data points
validationdataX = Predictors(i);
validationdataY = Response(i);
%Exclude the current index (i) for training
trainingIndices = setdiff(1:length(Predictors),i);
traningdataX = Predictors(trainingIndices)
trainingdataY = Response(trainingIndices)
net = fitlm(traningdataX,trainingdataY)
ypred = predict(net,validationdataX);
TrueVal = validationdataY;
TrueValue = cell2mat(TrueVal);
Predvalue = {Predval};
PredictedValue = cell2mat(Predvalue);
RMSE = rmse(PredictedValue,TrueValue)
end
Error using classreg.regr.TermsRegression/handleDataArgs (line 589)
Predictor variables must be numeric vectors, numeric matrices, or categorical vectors.
Error in LinearModel.fit (line 1000)
[X,y,haveDataset,otherArgs] = LinearModel.handleDataArgs(X,paramNames,varargin{:});
Error in fitlm (line 134)
model = LinearModel.fit(X,varargin{:});
Any suggestions on how to fix this and to get the model to work correcly and make predictions using leave out one validation approach would be greatly appreciated!
  9 Comments
Isabelle Museck
Isabelle Museck on 29 Jul 2024
Hi, Umar I appreciate your response. I am interested in comparing a simple lienear regression model to other models that I have built and in order to compare these fairly I want to keep the number of features the same regardless of the chanllenges of overfitting, compuational complexity, and nosie. Could you guide me in how I can achive this within my code? How can I input the predicotr data with 63 featuresx541 timesteps from the 9 trials and the respose data with 1 response variablex541 timesteps from the 9 trials into a linear model without getting the errors from the dimensions not being equivalent?
Umar
Umar on 29 Jul 2024

Hi @Isabelle Museck,

To input the predictor and response data into a linear model without dimension mismatch errors, you have to make sure that the dimensions of the data align correctly. In the provided code snippet, you can modify the data handling part as follows:

% Train the network

for i = 1:length(Predictors) % iterate over all data points

    validationdataX = Predictors(:, i); % Use all features for the current timestep
    validationdataY = Response(:, i); % Use the response variable for the current 

timestep

    % Exclude the current index (i) for training
    trainingIndices = setdiff(1:length(Predictors), i);
    trainingdataX = Predictors(:, trainingIndices); % Use all features for training data
    trainingdataY = Response(:, trainingIndices); % Use response variable for training data
    net = fitlm(trainingdataX', trainingdataY'); % Fit linear model
    ypred = predict(net, validationdataX'); % Predict using the model
    TrueValue = validationdataY';
    PredictedValue = ypred';
    RMSE = rmse(PredictedValue, TrueValue); % Calculate RMSE

end

Please bear in mind that this is example code snippet and you have to customize this code based on your preferences. Please let me know if you have any further questions.

Sign in to comment.

Answers (1)

Gayathri
Gayathri on 8 Aug 2024
I have implemented the codes in MATLAB R2024a. I can see that the issue of taking cell array as input has been resolved in the comment section. By reading the comments, I get to know that the new issue is passing the data into “fitlm" function without reducing any features by dimensionality reduction.
I am understanding that your data has nine arrays of dimension 63x541arrays and have corresponding responses which are of dimension 1x541. As the issue of taking input as cell array has already been solved in the comments, I am taking “Predictors” and “Responses” to be two random matrices drawn from normal distribution as input data. I am passing data of size 63x541 into “fitlm” function, response of which is a numeric vector of size 1x541. This approach could be used for fitting the data as it is, without using any dimensionality reduction techniques.
Please see the below code for your reference.
Predictors = randn(567, 541);
Response = randn(9, 541);
for i = 0:8 % iterate over all data points
validationdataX = Predictors(63*i+1:63*(i+1),:);
validationdataY = Response(i+1,:);
Predictors1=Predictors;
Predictors1(63*i+1:63*(i+1),:)=[];
trainingdataX = Predictors1;
Response1=Response;
Response1(i+1,:)=[];
trainingdataY=Response1;
for j=0:7
model = fitlm(trainingdataX((63*j)+1:63*(j+1),:)', trainingdataY(j+1,:)');
end
ypred = predict(model, validationdataX');
TrueValue = validationdataY';
RMSE = rmse(ypred, TrueValue)
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!