How to partition data in cells for validation in machine learning model?
4 views (last 30 days)
Show older comments
Hello there , I have training data for 4 trials stores in a 4x1 cell named "trainingdataX" and "trainingdataY" as whoen here and I am trying to pull out 15 percent of all this data for validation purposes and store it in variables "Xval" and "Yval". How would I be able to do this if the data is stored in a cells corresponding to the trials and ensure the corresponding value is partioned out for validation too? Any help is greatly appreciated!
%Exclude Data for Val
rng('default')
n = %im not sure what to put here to have it pull data from each of the 4 trials
partition = cvpartition(n,'Holdout',0.15);
idxTrain = training(partition);
FinalTrainX = trainingdataX(idxTrain,:)
FinalTrainY = trainingdataY(idxTrain,:)
idxNew = test(partition);
Xval = trainingdataX(idxNew,:)
Yval = trainingdataY(idxNew,:)
0 Comments
Answers (2)
YERRAMADAS
on 1 Aug 2024
Use the cross-validation method to maximize the data available for each of these sets
Aditya
on 1 Aug 2024
To partition data stored in cells for validation, you need to first concatenate the data from all trials into single matrices. After partitioning, you can then split the data back into the training and validation sets.
before moving forward you need to transpose your X and Y data, so that each row of X can correspond to the row of Y.
Here's a sample code for this:
% sample data
trainingdataX = cell(4, 1);
trainingdataY = cell(4, 1);
for i = 1:4
trainingdataX{i} = rand(541, 63);
trainingdataY{i} = rand(541, 1);
end
% Concatenate data
allX = vertcat(trainingdataX{:});
allY = vertcat(trainingdataY{:});
% Partition data (15% holdout for validation)
rng('default'); % For reproducibility
partition = cvpartition(size(allX, 1), 'Holdout', 0.15);
idxTrain = training(partition);
idxVal = test(partition);
% Split into training and validation sets
FinalTrainX = allX(idxTrain, :);
FinalTrainY = allY(idxTrain, :);
Xval = allX(idxVal, :);
Yval = allY(idxVal, :);
% Display results
fprintf('Training data X size: %dx%d\n', size(FinalTrainX, 1), size(FinalTrainX, 2));
fprintf('Training data Y size: %dx%d\n', size(FinalTrainY, 1), size(FinalTrainY, 2));
fprintf('Validation data X size: %dx%d\n', size(Xval, 1), size(Xval, 2));
fprintf('Validation data Y size: %dx%d\n', size(Yval, 1), size(Yval, 2));
I hope this helps!
2 Comments
Aditya
on 1 Aug 2024
Edited: Aditya
on 1 Aug 2024
As mentioned in my post that your initial data is in shape: 63X541 & 1X541, which is incorrect for vertical concat, for this you need to take the transpose of it and use it:
Inorder to transpose it you can use the below line of code:
% Transpose each cell using cellfun
trainingdataX = cellfun(@transpose, trainingdataX, 'UniformOutput', false);
trainingdataY = cellfun(@transpose, trainingdataY, 'UniformOutput', false);
or you can do it manually using the for loop!
Hope this clarifies your doubt!
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!