why using imageDataAugmenter doesn't increase the size of my training data set ?
Show older comments
I am trying to use imageDataAugmenter to increase the size of my training dataset (number of training images) but it seems like it has no effect at all. to explain : I had used simple CNN to classify an image from three categories. Each category has 200 images (120 training, 40 validation and 40 for testing). creating the imageDatastores:
*[TrainDataStore,ValDataStore,TestDatastore] = splitEachLabel(imds,0.6,0.2,'randomize'); *
training the network
*net = trainNetwork(TrainDataStore,mynet_1,options);*
so, as the number of Epochs and miniBatch are the same in all cases (5) and (60), I got 30 iterations and 6 iterations per epoch. 6 (iterations) * 60 (miniBatch)= 360 images (120 per each label).
I tried to use Data Augmentation and as follow :
*augmenter = imageDataAugmenter('RandRotation',[0 30]);*
*[TrainDataStore,ValDataStore,TestDatastore] = splitEachLabel(i_mds,0.6,0.2,'randomize');*
Traindatasource = augmentedImageSource([200 200 3],TrainDataStore,'DataAugmentation',augmenter);
net = trainNetwork(Traindatasource,mynet_1,options);
and again I ended up with (6) iterations per Epoch, 5 Epochs which means the total number of images is the same (360) even though it should be increased because we have a rotation property.
I don't know how the augmented data set size will be but its definately should be more than the original one. If there is something missing in my approach please let me know.
Accepted Answer
More Answers (2)
Xu MingJie
on 8 Aug 2018
1 vote
Because. the data augmentation of imageDataAugmenter function is not the traditional increase of data in memory. It is supposed that your dataset is too big to allocate themselves in memory. Therefore, the staff of matlab utilize the idea of data augmentation and fit the limited memory of computers, as reference to this website: https://ww2.mathworks.cn/help/nnet/ug/preprocess-images-for-deep-learning.html#mw_ef499675-d7a0-4e77-8741-ea5801695193.
In more details, after you configure image transformation options, the size of training dataset is always same in each epoch. However, for each iteration of training, the augmented image datastore applies a random combination of transformations to the mini-batch of training data. Thus, in each epoch, the amount of training dataset is always same, but every training images have a little bit different caused by your transformation operations such as rotation.
Guy Reading
on 27 Sep 2019
0 votes
For all those still reading this: there is a solution!
I was making the same assumption as you, caesar. However, given J's answer, there's a work-around. If the network rarely sees the same training example twice, given what the augmenter does, we can just increase the number of epochs in trainingOptions:
That way, although we don't present the whole dataset within one epoch, we present something like the whole dataset in N number of epochs, where N is the multiple which we assumed the augmenter multipled our sample size by. If we increase the epoch number by N, we get something like what we expected in the first place, I believe (correct me if I'm wrong!)
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!