Split dataset into three different size sets without overlapping
3 views (last 30 days)
Show older comments
I am working on image processing using Matlab. I need to split a large dataset into three non-overlapped subsets (25%, 25% and 50%). The dataset (let's say has 1K images) has 10 classes (each has 100 images). from class 1, 25% of images should be in the training set, other 25% should be stored in the validation set and the rest (50%) should be stored in the testset. there should not repetition. I mean if an image from a class has been stored in a subset, it must not be stored in other subsets of the class. How do I do that in Matlab?
My code is as follows:
load ('data.mat')
for i = 1:size(data, 1)
for j = 1:78
if mod(i,2)==0
trainingset(i/2,j) = data(i,j);
else
remainset((i-1)/2+1,j) = data(i,j);
end
end
end
for i = 1:size(remainset, 1)
for j = 1:78
if mod(i,2)==0
testset(i/2,j) = remainset(i,j);
else
validationset((i-1)/2+1,j) = remainset(i,j);
end
end
end
Although it somehow works, I am looking for a better algorithm as some parts of data are lost.
Answers (1)
Frank B.
on 8 May 2018
Here is a quick answer using datasample, for a single vector named data. Loop over your classes or use indexes if they have to be shared.
load ('data.mat')
% Declaring data division ratio
% 25% for training, 25% for validation, 50% for test
dataset_div=[0.25 0.25 0.5];
% Number of data in each set
nb_train=(dataset_div(1)/sum(dataset_div))*length(data);
nb_valid=(dataset_div(2)/sum(dataset_div))*length(data);
nb_test=(dataset_div(3)/sum(dataset_div))*length(data);
% Splitting data in 3 un-overlapping vector
% Training data
[data_train,idx_sample]=datasample(data,nb_train,'Replace',false);
% Removing used values
idx_left=1:length(data);
idx_left(idx_sample)=[];
val_left=data(idx_left);
% Validation data
[data_valid,idx_sample]=datasample(val_left,nb_valid,'Replace',false);
% Removing used values
idx_left=1:length(val_left);
idx_left(idx_sample)=[];
val_left=data(idx_left);
% Test data
[data_test,idx_sample]=datasample(val_left,nb_test,'Replace',false);
Cheers
0 Comments
See Also
Categories
Find more on Image Data Workflows in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!