Neural Network Toolbox - Backpropagation stopping criteria

I am using Neural Network Toolbox to classify a data of 12 alarms into 9 classes with one hidden layer containing 8 neurons. I wanted to know:
  1. What equations does training algorithm traingdm use to update the weights and bias? Are these the same as given below (etta is learning rate i.e. 0.7 and alpha is momentum coefficient i.e. 0.9):
where delta_j for output layer is:
while for hidden layer it is:
These equations are taken directly from the paper attached.
2. What does the stopping criteria net.trainParam.goal mean? Which field to update if I want my stopping criteria to be mean square error equal to 0.0001? Do I need to update net.trainParam.min_grad to 0.0001 for this?
3. How are the weights being updated in traingdm? Is it batch updation (like after every epoch) or is it updation after every input pattern of every epoch?
4. I have 41 training input patterns. How many of those are use for training process and how many for recall process. What if I want all 41 of them to be used only for training process?
5. I have tried the following code but the outputs are not being classified accurately.
clear all; close all; clc;
p = [
1 0 0 0 0 0 0 0 0 0 0 0; ... %c1
1 0 1 0 0 0 0 0 0 0 0 0; ...
1 0 1 1 0 0 0 0 0 0 0 0; ...
1 0 1 0 1 0 0 0 0 0 0 0; ...
1 0 1 0 0 0 0 0 0 1 0 0; ...
1 0 1 1 1 0 0 0 0 0 0 0; ...
1 0 1 0 1 1 0 0 0 1 0 0; ...
1 0 1 0 1 0 0 0 0 1 0 0; ...
1 0 1 1 0 0 0 0 0 1 0 0; ...
1 0 1 0 1 1 1 0 0 0 0 0; ...
1 0 1 0 1 1 0 1 0 0 0 0; ...
1 0 1 1 1 0 0 0 0 1 0 0; ...
0 1 0 0 0 0 0 0 0 0 0 0; ... %c2
0 0 0 0 0 0 0 0 0 0 0 0; ...
0 0 0 1 0 0 0 0 0 0 0 0; ...
0 0 0 0 1 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0 0 1 0 0; ...
0 0 0 1 1 0 0 0 0 0 0 0; ...
0 0 0 0 1 1 0 0 0 1 0 0; ...
0 0 0 0 1 0 0 0 0 1 0 0; ...
0 0 0 1 0 0 0 0 0 1 0 0; ...
0 0 0 0 1 1 1 0 0 0 0 0; ...
0 0 0 0 1 1 0 1 0 0 0 0; ...
0 0 0 1 1 0 0 0 0 1 0 0; ...
0 0 0 1 0 0 0 0 0 0 0 0; ... %c3
0 0 0 0 1 0 0 0 0 0 0 0; ... %c4 or c5
0 0 0 0 1 1 0 0 0 0 0 0; ...
0 0 0 0 1 1 1 0 0 0 0 0; ...
0 0 0 0 1 1 0 1 0 0 0 0; ...
0 0 0 0 0 1 0 0 0 0 0 0; ... %c6
0 0 0 0 0 1 1 0 0 0 0 0; ...
0 0 0 0 0 1 0 1 0 0 0 0; ...
0 0 0 0 0 0 0 1 0 0 0 0; ... %c7
0 0 0 0 0 0 0 0 1 0 0 0; ... %c8
0 0 0 0 0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 0 1 1 0 0; ...
0 0 0 0 0 0 0 0 0 0 1 1; ...
0 0 0 0 0 0 0 0 1 0 1 0; ...
0 0 0 0 0 0 0 0 0 0 0 1; ... %c9
0 0 1 0 0 0 0 0 0 0 0 0; ... %c1 or c2
0 0 0 0 0 0 0 0 0 1 0 0; ... %c1 or c2 or c3
]';
t = [
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0;...
1 0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ... %c2
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 0; ...
0 0 1 0 0 0 0 0 0; ... %c3
0 0 0 1 1 0 0 0 0; ... %c4 or c5
0 0 0 1 1 0 0 0 0; ...
0 0 0 1 1 0 0 0 0; ...
0 0 0 1 1 0 0 0 0; ...
0 0 0 0 0 1 0 0 0; ... %c6
0 0 0 0 0 1 0 0 0; ...
0 0 0 0 0 1 0 0 0; ...
0 0 0 0 0 0 1 0 0; ... %c7
0 0 0 0 0 0 0 1 0; ... %c8
0 0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 0 1; ... %c9
1 1 0 0 0 0 0 0 0; ... %c1 or c2
1 1 1 0 0 0 0 0 0; ... %c1 or c2 or c3
]';
net = feedforwardnet(8,'traingdm'); %8 hidden layers and training algorithm
net = configure(net,p,t);
net.layers{2}.transferFcn = 'logsig'; %sigmoid function in output layer
net.layers{1}.transferFcn = 'logsig'; %sigmiod fucntion in hidden layer
net.performFcn = 'mse';
net = init(net);
net.trainParam.epochs = 100000; %no. of epochs are not my concern hence a large number
net.trainParam.lr = 0.7; %obtained from the paper attached
net.trainParam.mc = 0.9; %obtained from the paper attached
net.trainParam.max_fail = 100000;
net.trainParam.min_grad = 0.00015; %is this stopping criteria same as mse?
net = train(net,p,t);
view(net);
Let me know if something else needs to be specified. Regards.

1 Comment

% Target columns should sum to 1
% If targets are mutually exclusive there is only one "1"
% init(net) unecessary because of configure
NO MITIGATION FOR OVERTRAINING AN OVERFIT NET
1. max_epoch is HUGE
2. msegoal not specified ==> default of 0
3. no validation stopping
4. no regularization (trainbr)
Hope this helps.
Greg

Sign in to comment.

 Accepted Answer

If you are going to use MATLAB, I suggest using as many defaults as possible.
1. Use PATTERNNET for classification
2. To see the default settings, type into the command line WITHOUT AN ENDING SEMICOLON
net = patternnet % default H = 10
3. If a vector can belong to m of c classes,
a. The c-dimensional unit target vector should contain
i. m positive components that sum to 1
ii. c-m components of value 0
4. Typically, the only things that need to be varied are
a. H, the number of hidden nodes
b. The initial weights
5. The best way to do this is
a. Initialize the RNG
b. Use an outer loop over number of hidden nodes
c. Use an inner loop over random weight initializations
d. For example
Ntrials = 10
rng('default')
j=0
for h = Hmin:dH:Hmax
j=j+1
...
for i = 1:Ntrials
...
end
end
6. See the NEWSGROUP and ANSWERS for examples. Search with
greg patternnet Ntrials
Hope this helps
Thank you for formally accepting my answer
Greg

8 Comments

Hi Greg, Thank you for your response but I am afraid I am not following your answer as it does not seem to answer any of my questions.
Can you please explain it a little?
What I was trying to say was your approach to classification doesn't seem to be promising. My recommendation was to use a more standard approach with a default heavy version of patternnet as I have demonstrated in numerous examples. However, first check out the documentation and trivial examples in
help patternnet
doc patternnet
Nonetheless, I did run your code. However, I obtained gruesome results. When I looked at your data a little more closely, I see a major part of the problem (which will exist regardless of whether a classification function like patternnet or a regression function like fitnet or feedforwardnet is used):
Your class representations are severely unbalanced
For example
sum(t') = [ 14 14 2 4 4 3 1 5 1 ]
sum(sum(t')) = 48
The easiest way to deal with this is to add replicas of the 7 smaller classes so that the number of samples per class is approximately equal. My posted examples of the BIOID data classification are good examples. In fact, adding a SMALL amount of 9-dimensional random noise (jitter) might even help.
In the past 30+ years I have never dealt with a classification target matrix with columns containing more than 1 nonzero entry. I think the best way to deal with those is also to make replicas instead of having fractional targets.
Then, the classifier has a more clear definition of the 9 classes.
I will answer some of your questions in the next comment.
Hi Greg, Thanks for your comment. That clarifies things a little bit. But the authors in the paper attached HAVE classified this same data (41 input samples) using BPN and I was just hoping to reproduce the results using Neural Network Toolbox. Would it help to reproduce the results if I wrote my own MATLAB script implementing the weight equations or would it produce the same results as MATLAB's Neural Network Toolbox?
% Neural Network Toolbox - Backpropagation stopping criteria % % Asked by Haider Ali about 3 hours ago % % I am using Neural Network Toolbox to classify a data of 12 alarms % into 9 classes with one hidden layer containing 8 neurons. I wanted % to know:
How did you determine H = 8?
% 1.What equations does training algorithm traingdm use to update % the weights and bias? Are these the same as given below (etta is % learning rate i.e. 0.7 and alpha is momentum coefficient i.e. 0.9): % where delta_j for output layer is: % while for hidden layer it is: *These equations are taken directly from the paper attached.
help traingdm
doc traingdm
type traingdm
% 2. What does the stopping criteria net.trainParam.goal mean?
Training stops if the error function is <= goal.
% Which field to update if I want my stopping criteria to be mean square % error equal to 0.0001?
net.trainParam.goal = 0.0001
%Do I need to update net.trainParam.min_grad to 0.0001 for this?
No. However, I use
MSEgoal = 0.01*mean(var(t',1)) % or 0.005
net.trainParam.goal = MSEgoal
net.trainParam.min_grad = MSEgoal/100
% 3. How are the weights being updated in traingdm? Is it batch % updation (like after every epoch) or is it updation after every % input pattern of every epoch?
The default is batch. However you can use adaptation if you wish See the documentation
help/doc train
help/doc adapt
% 4. I have 41 training input patterns. How many of those are use % for training process and how many for recall process.
total = design + test
design = training +validation
The default ratios are Ntrn/Nval/Ntst = 0.7/0.15/0.15
% What if % I want all 41 of them to be used only for training process?
Typically, not a great idea. Search the NN literature for overfitting,
overtraining and generalization
net = patternet; % For classification
net.trainFcn = 'trainbr'; % If Ntrn = N
Thanks for your comment Greg.
All the values (no. of layers and no. of neurons, learning rate, momentum coefficient etc.) are taken from the paper attached and I am just trying to reproduce the results.
I used the following parameters as you suggested and MSEgoal comes out to be approximately 0.001 using the formula you have provided and I have also set the net.trainParam.min_grad as you suggested
MSEgoal = 0.01*mean(var(t',1)) % or 0.005
net.trainParam.goal = MSEgoal
net.trainParam.min_grad = MSEgoal/100
After running the code using these parameters, I noticed that error (goal) does not get less than 0.23 and the algorithm stops either because of either no. of epochs or validation checks. Is this because of the nature of the data set as you said earlier? If it is because of the nature of data set then how have the authors of the paper gotten pretty good results using the same algorithm and data set?
Regards.
NMSE = mse(t-y)/mean(var(t',1)) >= 0.23 for all 100 of your designs?
numel(Hmin:dH:Hmax) = 10 ?
Ntrials = 10 ?
Did the paper use trn/val/tst. If so, what ratios?
How many hidden nodes?
Hi Greg,
There is only one hidden layer containing 8 neurons. The author has not mentioned the train/validate/test ratio.
I am now using the Iris Data Set to train my NN using Back Propagation (just for my own understanding and testing). The code is below:
clear all;
close all;
clc;
p = [
5.1,3.5,1.4,0.2; %iris data set
4.9,3.0,1.4,0.2;
4.7,3.2,1.3,0.2;
4.6,3.1,1.5,0.2;
5.0,3.6,1.4,0.2;
5.4,3.9,1.7,0.4;
4.6,3.4,1.4,0.3;
5.0,3.4,1.5,0.2;
4.4,2.9,1.4,0.2;
4.9,3.1,1.5,0.1;
5.4,3.7,1.5,0.2;
4.8,3.4,1.6,0.2;
4.8,3.0,1.4,0.1;
4.3,3.0,1.1,0.1;
5.8,4.0,1.2,0.2;
5.7,4.4,1.5,0.4;
5.4,3.9,1.3,0.4;
5.1,3.5,1.4,0.3;
5.7,3.8,1.7,0.3;
5.1,3.8,1.5,0.3;
5.4,3.4,1.7,0.2;
5.1,3.7,1.5,0.4;
4.6,3.6,1.0,0.2;
5.1,3.3,1.7,0.5;
4.8,3.4,1.9,0.2;
5.0,3.0,1.6,0.2;
5.0,3.4,1.6,0.4;
5.2,3.5,1.5,0.2;
5.2,3.4,1.4,0.2;
4.7,3.2,1.6,0.2;
4.8,3.1,1.6,0.2;
5.4,3.4,1.5,0.4;
5.2,4.1,1.5,0.1;
5.5,4.2,1.4,0.2;
4.9,3.1,1.5,0.1;
5.0,3.2,1.2,0.2;
5.5,3.5,1.3,0.2;
4.9,3.1,1.5,0.1;
4.4,3.0,1.3,0.2;
5.1,3.4,1.5,0.2;
5.0,3.5,1.3,0.3;
4.5,2.3,1.3,0.3;
4.4,3.2,1.3,0.2;
5.0,3.5,1.6,0.6;
5.1,3.8,1.9,0.4;
4.8,3.0,1.4,0.3;
5.1,3.8,1.6,0.2;
4.6,3.2,1.4,0.2;
5.3,3.7,1.5,0.2;
5.0,3.3,1.4,0.2;
7.0,3.2,4.7,1.4;
6.4,3.2,4.5,1.5;
6.9,3.1,4.9,1.5;
5.5,2.3,4.0,1.3;
6.5,2.8,4.6,1.5;
5.7,2.8,4.5,1.3;
6.3,3.3,4.7,1.6;
4.9,2.4,3.3,1.0;
6.6,2.9,4.6,1.3;
5.2,2.7,3.9,1.4;
5.0,2.0,3.5,1.0;
5.9,3.0,4.2,1.5;
6.0,2.2,4.0,1.0;
6.1,2.9,4.7,1.4;
5.6,2.9,3.6,1.3;
6.7,3.1,4.4,1.4;
5.6,3.0,4.5,1.5;
5.8,2.7,4.1,1.0;
6.2,2.2,4.5,1.5;
5.6,2.5,3.9,1.1;
5.9,3.2,4.8,1.8;
6.1,2.8,4.0,1.3;
6.3,2.5,4.9,1.5;
6.1,2.8,4.7,1.2;
6.4,2.9,4.3,1.3;
6.6,3.0,4.4,1.4;
6.8,2.8,4.8,1.4;
6.7,3.0,5.0,1.7;
6.0,2.9,4.5,1.5;
5.7,2.6,3.5,1.0;
5.5,2.4,3.8,1.1;
5.5,2.4,3.7,1.0;
5.8,2.7,3.9,1.2;
6.0,2.7,5.1,1.6;
5.4,3.0,4.5,1.5;
6.0,3.4,4.5,1.6;
6.7,3.1,4.7,1.5;
6.3,2.3,4.4,1.3;
5.6,3.0,4.1,1.3;
5.5,2.5,4.0,1.3;
5.5,2.6,4.4,1.2;
6.1,3.0,4.6,1.4;
5.8,2.6,4.0,1.2;
5.0,2.3,3.3,1.0;
5.6,2.7,4.2,1.3;
5.7,3.0,4.2,1.2;
5.7,2.9,4.2,1.3;
6.2,2.9,4.3,1.3;
5.1,2.5,3.0,1.1;
5.7,2.8,4.1,1.3;
6.3,3.3,6.0,2.5;
5.8,2.7,5.1,1.9;
7.1,3.0,5.9,2.1;
6.3,2.9,5.6,1.8;
6.5,3.0,5.8,2.2;
7.6,3.0,6.6,2.1;
4.9,2.5,4.5,1.7;
7.3,2.9,6.3,1.8;
6.7,2.5,5.8,1.8;
7.2,3.6,6.1,2.5;
6.5,3.2,5.1,2.0;
6.4,2.7,5.3,1.9;
6.8,3.0,5.5,2.1;
5.7,2.5,5.0,2.0;
5.8,2.8,5.1,2.4;
6.4,3.2,5.3,2.3;
6.5,3.0,5.5,1.8;
7.7,3.8,6.7,2.2;
7.7,2.6,6.9,2.3;
6.0,2.2,5.0,1.5;
6.9,3.2,5.7,2.3;
5.6,2.8,4.9,2.0;
7.7,2.8,6.7,2.0;
6.3,2.7,4.9,1.8;
6.7,3.3,5.7,2.1;
7.2,3.2,6.0,1.8;
6.2,2.8,4.8,1.8;
6.1,3.0,4.9,1.8;
6.4,2.8,5.6,2.1;
7.2,3.0,5.8,1.6;
7.4,2.8,6.1,1.9;
7.9,3.8,6.4,2.0;
6.4,2.8,5.6,2.2;
6.3,2.8,5.1,1.5;
6.1,2.6,5.6,1.4;
7.7,3.0,6.1,2.3;
6.3,3.4,5.6,2.4;
6.4,3.1,5.5,1.8;
6.0,3.0,4.8,1.8;
6.9,3.1,5.4,2.1;
6.7,3.1,5.6,2.4;
6.9,3.1,5.1,2.3;
5.8,2.7,5.1,1.9;
6.8,3.2,5.9,2.3;
6.7,3.3,5.7,2.5;
6.7,3.0,5.2,2.3;
6.3,2.5,5.0,1.9;
6.5,3.0,5.2,2.0;
6.2,3.4,5.4,2.3;
5.9,3.0,5.1,1.8;
]';
t = [
0; %assign 0 to output neuron for Iris-setosa
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0.5; %assign 0.5 to output neuron for Iris-versicolor
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
1; %assign 1 to output neuron for Iris-virginica
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
]';
net = feedforwardnet(3,'traingd'); %3 hidden layers and training algorithm
net = configure(net,p,t);
net.layers{2}.transferFcn = 'logsig'; %sigmoid function in output layer
net.layers{1}.transferFcn = 'logsig'; %sigmiod fucntion in hidden layer
net.performFcn = 'mse';
net = init(net);
net.trainParam.epochs = 10000;
net.trainParam.lr = 0.7; %learning rate
net.trainParam.goal = 0.01; %mse
net = train(net,p,t);
view(net);
The problem is that I am not getting the desired output for the first class (for which the output should be close to zero). When I input a vector from the first class to the trained net, the output is close to 0.5 (but it should be close to zero).
This is the output for the first vector of the first class:
output = net([5.1,3.5,1.4,0.2]')
output =
0.5003
This output should be close to zero (because I have assigned 0 to first class), but it is coming out to be 0.5. This is the case for all the inputs of first class. For the second and third class, the outputs are fine i.e. close to 0.5 for class 2 and close to 1.0 for class 3.
Can you please run this code and tell me what I am doing wrong?
(I think it might be issue of the bias input because all the outputs for class 1 are being offset by 0.5.)
Regards.
%GEH1: LOUSY TARGET CODING
%GEH2: traingd instead of traingdm
% GEH3: Logsig output INVALID for default mapmaxmin [-1 1 ] scaling
Hope this helps
Greg

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Asked:

on 21 Mar 2015

Commented:

on 25 Apr 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!