Can you please guide utilizing Validation Data along with an image data store?

I'm using a data store of 50 k images of Breast Cancer.
I want to verify if my training learning is susceptible to overfitting.
But I don't understand, How to use validation data.
This my script without validation data:

 Accepted Answer

You can specify validation data in the training options with the ValidationData option. Below I've used your test data set, but you probably might want to create a separate validation set from your original data:
opts = trainingOptions("sgdm", ...
"InitialLearnRate", 0.0001, ...
"MaxEpoch", 10, ...
"Plots", "training-progress", ...
"ValidationData", resizeBreastCancerTest);

4 Comments

Thank you for your assistance. I understand that you mentioned creating a new data store. Actually, I create a new datastore with the function splitEachLabel named ValData. However, I am unsure how to choose the number of validation images.
In addition, my script takes a long time to compile with the CPU.
There are no definite rules on how to choose the number of validation images, but if you have thousands of observations in your data, a common ratio is to use 70% for training, 15% for test, and 15% for validation.
If you have validation data, at certain points during the training the accuracy will be measured on the entire validation set. This can take a very long time (because the entire validation set is typically much larger than each mini-batch that is used for training). You can speed things up by reducing the number of evaluations on the validation set. This can be done by increasing the ValidationFrequency option (the default value is 50):
opts = trainingOptions("sgdm", ...
"InitialLearnRate", 0.0001, ...
"MaxEpoch", 10, ...
"Plots", "training-progress", ...
"ValidationData", resizeBreastCancerTest, ...
"ValidationFrequency", 500);
Thanks for your assistance.
I understand your point of view.
I had another question I tried my script.
However, my training progress is not efficient. I would like to know if I'm in a case of overfitting. Could you please give me some advice in order to improve the results?
There are a few things you could try.
  • The initial learning rate of 0.0001 is quite low. You could try increasing it.
  • Alternatively, you could try using the ADAM algorithm for training, which tends to be less sensitive to hyperparameters. You can do this by changing "sgdm" in the call to trainingOptions to "adam".
  • You could train for more epochs. 10 epochs is a fairly low number.

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!