How to use trainlm with L2 regulation
Show older comments
As a simple test problem, I am training a neural network with 1 hidden layer for function fitting. The training method "trainlm" works well when I set
net.performParam.regularization=0.
To prevent over-fitting (and other purposes), I like to introduce L2 regulation. However, when I set
net.performParam.regularization=1e-6 (or any other positive number),
the training stopped at iteration 3 with "Maximum Mu reached".
Can we use trainlm with L2 regulation at all?
3 Comments
Shivansh
on 10 Sep 2023
Hi Hongyun,
We can certainly use regularization with trainlm as training function. You can refer to the below code snippet for verification.
% Generate random training data
inputSize = 10; % Number of input features
outputSize = 1; % Number of output targets
numSamples = 100; % Number of training samples
X = rand(numSamples, inputSize); % Random input data
Y = rand(numSamples, outputSize); % Random output targets
% Create a feedforward neural network
hiddenSize = 5; % Number of hidden units
net = feedforwardnet(hiddenSize);
% Set the regularization parameter
net.trainFcn='trainlm';
net.performParam.regularization = 1e-6;
% Set up the training parameters
net.trainParam.epochs = 100; % Maximum number of epochs
net.trainParam.showCommandLine = true; % Display training progress in command window
net.trainParam.showWindow = false; % Do not show training GUI
% Train the network
net = train(net, X', Y');
% Evaluate the trained network on training data
Y_pred = net(X');
mse = mean((Y_pred - Y').^2);
% Display the mean squared error
disp(['Mean Squared Error: ', num2str(mse)]);
The output of the above code:
Calculation mode: MEX
Training Feed-Forward Neural Network with TRAINLM.
Epoch 0/100, Time 0.009, Performance 0.20376/0, Gradient 0.25607/1e-07, Mu 0.001/10000000000, Validation Checks 0/6
Epoch 7/100, Time 0.022, Performance 0.023849/0, Gradient 0.029916/1e-07, Mu 0.0001/10000000000, Validation Checks 6/6
Training with TRAINLM completed: Training finished: Met validation criterion
Mean Squared Error: 0.081813
The regularization used by trainlm is not exactly L2. The net.performParam.regularization parameter in "trainlm" actually controls weight/bias regularization, not specifically L2 regularization. The error you are getting "Maximum mu reached" is because the trainlm function supports a maximum limit of mu. The program might have reached the limit due to some issues in setting of hyperparameters. You should consider using hyperparameter tuning to set the optimal values to the parameters with respect to your model. You can refer to the documentation of trainlm here Levenberg-Marquardt backpropagation - MATLAB trainlm - MathWorks India.
Hongyun Wang
on 10 Sep 2023
Shivansh
on 11 Sep 2023
Hi Hongyun,
The above code can be executed with regularization but the parameters should be in sync with each other. Some possible workarounds can be:
- Decreasing the complexity of the model. (It works when hidden size = 32).
- Decreasing the strength of regularization. (Working for net.performParam.regularization = 1e-7;)
The above actions will execute the code but may not lead to optimal results.
You are able to get the results when training first without regularization and then with regularization because the first training sets the initial weights closer to optimal setting and the second training makes the solution better. Another way to resolve the above problem with similar parameters can be to execute the regularization training first. When you train the network with regularization first, the algorithm reaches the maximum mu value and terminates prematurely, as you mentioned. However, when you subsequently train the network without regularization, it starts from the weights obtained from the previous training and continues the optimization process. Since the network is already initialized with weights that are close to the optimal solution, the training without regularization is able to further improve the performance and achieve the desired goal.
Answers (1)
Ashu
on 6 Sep 2023
0 votes
Hey Wang,
I understand that you training a network with "trainlm" and the training stops with "Maximum Mu reached". The problem with regularization is that it is difficult to determine the optimum value for the performance ratio parameter. If you make this parameter too large, you might get overfitting. If the ratio is too small, the network does not adequately fit the training data.
The following suggestions might help you in training the network better.
1. To resolve this issue you can experiment with the training parameters of "trainlm", like increasing the value of "net.trainParam.mu_max", "net.trainParam.mu_dec".
For the list of parameters, please refer the following documentation of "trainlm"
2. Use Automated Regularisation (trainbr) : The weights and biases of the network are assumed to be random variables with specified distributions. The regularization parameters are related to the unknown variances associated with these distributions. You can then estimate these parameters using statistical techniques.
Please refer to the following page to learn more about this.
I hope this was helpful.
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!