neural network back propagation problem

5 views (last 30 days)
im using 2 inputs and single output. then the same network structure apply for 3 inputs and two outputs. however, i dont get too near output value. whats wrong with this network? or i need to change it to other type of structure?
clear all;clc;clear;
% load data % p=[0 0 1 1; 0 1 0 1]; % t = [0 1 1 0];
p = [0 0 0 0 1 1 1 1; 0 0 1 1 0 0 1 1; 0 1 0 1 0 1 0 1]; t = [0 1 0 0 0 1 1 1; 0 1 0 0 1 1 0 0];
net = newff(p,t,[15, 15],{'logsig','logsig'},'traingd');
net.trainParam.perf = 'mse'; net.trainParam.epochs = 100; net.trainParam.goal = 0; net.trainParam.lr = 0.9; net.trainParam.mc = 0.95; net.trainParam.min_grad = 0;
[net,tr] = train(net,p,t);
y=sim (net,p)'

Accepted Answer

Greg Heath
Greg Heath on 16 Jun 2013
% Ntrn/Nval/Ntest = 7/0/1
close all, clear all, clc
tic
ptrn = [0 0 0 0 1 1 1 ; 0 0 1 1 0 0 1 ; 0 1 0 1 0 1 0 ]
ttrn = [0 1 0 0 0 1 1 ; 0 1 0 0 1 1 0 ]
ptst = [ 1; 1; 1 ]
ttst = [ 1; 0 ]
[I Ntrn] = size (ptrn) % [ 3 7 ]
[O Ntrn] = size (ttrn) % [ 2 7 ]
Ntrneq = prod(size(ttrn)) % 14
MSEtrn00 = mean(var(ttrn',1)) % 0.2449
[I Ntst] = size(ptst) % [ 3 1 ]
% Nw = (I+1)*H+(H+1)*O = O +(I+O+1)*H < Ntrneq
Hub = -1 + ceil( (Ntrneq-O) / (I+O+1)) % 1
Nwub = O+(I+O+1)*Hub % 8 < 14
Hmax = 3
dH=1
Hmin =0
Ntrials = 20
MSEgoal = 0.01*MSEtrn00 % 2.4e-3 => R2trn >= 0.99
MinGrad = MSEgoal/10 % 2.4e-4
rng(0)
j=0
for h = Hmin:dH:Hmax
j=j+1
if h==0
net = newff(ptrn,ttrn,[]);
Nw = (I+1)*O
else
net = newff(ptrn,ttrn,h);
Nw = (I+1)*h+(h+1)*O
end
Ndof = Ntrneq-Nw
net.divideFcn = 'dividetrain';
net.trainParam.goal = MSEgoal;
net.trainParam.min_grad = MinGrad;
for i = 1:Ntrials
h = h
ntrial = i
net = configure(net,ptrn,ttrn);
[ net tr Ytrn ] = train(net,ptrn,ttrn);
ytrn = round(Ytrn)
MSEtrn = mse(ttrn-ytrn)
R2trn(i,j) = 1-MSEtrn/MSEtrn00;
Ytst = net(ptst)
ytst1(i,j) = round(Ytst(1));
ytst2(i,j) = round(Ytst(2));
end
end
H = Hmin:dH:Hmax
R2trn = R2trn
ytst1 = ytst1
ytst2 = ytst2
toc % 26 sec
% Training Summary:
1. R2trn > 0.71 only if the net is overfit (H=2, 3)
2. When R2trn > 0.71, R^2 = 1 (MEMORIZATION)
3. R2trn = 1 50% of the time when H = 2 and 90%
of the time when H = 3
4. When H=0 (linear), max(R2trn) = 0.71 25% of the time
5. When H =1, max(R2trn) = 0.42 60% of the time
% Generalization Summary
1. ytst(1) vs ttst(1)=1
When H =0:3, the corresponding number of errors are
[ 0 5 10 13 ]
2. ytst(2) vs ttst(2)=0
When H =0:3, the corresponding number of errors are
[ 11 17 14 12 ]
  3 Comments
Greg Heath
Greg Heath on 10 Jul 2013
You don't seem to understand the following basic assumptions>
To expect a net to generalize to the complete data set.
1. The training set must adequately characterize the complete data set.
2. If overfitting precautions ( Ntrneq >> Nw or Nval >> 1 or trainFcn = 'trainbr') are not made, the net can memorize the training set but perform poorly for nontraining data.
Go to the comp.ai.neural-nets FAQ and search on
Generalization
Overfitting
Hope this helps.
If not, please respond with more questions.
Greg
azie
azie on 19 Jul 2013
u mean that my data either not in a complete set or the network is overfitting? therefore i dont get the good result in nontraining data, is it? but i have done all the steps to prevent overfit, just dont know whether the experimental data is enough to cover everything or not.

Sign in to comment.

More Answers (3)

Greg Heath
Greg Heath on 13 Jun 2013
Why don't you just use the code in help newff?
Note that you have a 3-15-15-2 node topology with
Nw = (3+1)*15+(15+1)*15+(15+1)*2 = 332 Unknown weights
Ntrn = 8 - 2*round(0.15*8) = 6 training patterns
Ntrneq = Ntrn*2 = 12 training equations
If 12 equations for 332 unknowns makes you uneasy, remove one of the hidden layers and remove some of the hidden nodes from the remaining hidden layer.
Hope this helps.
Thank you for formally accepting my answer
Greg
  6 Comments
Greg Heath
Greg Heath on 13 Jun 2013
P.S. If you have patternnet, then newfit, newpr and newff are obsolete.
They should be replaced by fitnet, patternnet and feedforwardnet, respectively.
Use fitnet for regression and curve-fitting.
Use patternnet for classification and pattern recognition.
There is no reason to use feedforward net. It is called automatically by fitnet and patternnet.
azie
azie on 14 Jun 2013
Edited: azie on 14 Jun 2013
Dear Greg,
%%modified code
p = [0 0 0 0 1 1 1 ; 0 0 1 1 0 0 1 ; 0 1 0 1 0 1 0 ];
t = [0 1 0 0 0 1 1 ; 0 1 0 0 1 1 0 ];
[I N] = size (p);
[O N] = size (t);
net = newff(p,t,[3,3],{'logsig','logsig'},'trainlm');
net.divideFcn = '';
net.trainParam.perf = 'mse';
net.trainParam.epochs = 500;
net.trainParam.goal = 0;
net.trainParam.lr = 0.9;
net.trainParam.mc = 0.95;
net.trainParam.min_grad = 0;
net = init(net);
[net,tr] = train(net,p,t);
y=sim (net,p)'
j=[1 ; 1; 1];%suppose result=[ 1;0 ]
y=sim (net,j)'
a) im trying to change from two layers to one layer but there is funny result i get for the output. all the lowest output become not less than 0.5. but if im using 2 layers, the result become exactly like the target which i think is correct. So, that why im stay with 2 layer.
b) yes, im reducing the number of hidden neuron like u told me to. im shocked to know that even 2 neuron in both layers can give the same exact result as target. is it acceptable or not?
c)like i said before, im trying to predict the j value of input, even though the training session went well with almost zero error. but the network is still worst in predicting new input.what should i do?

Sign in to comment.


Greg Heath
Greg Heath on 15 Jun 2013
1. You mean a net with 2 HIDDEN layers. The unmodified term "layers" means hidden AND output layers.
In the last 30 years of designing NNs, I have never encountered a net that needed 2 hidden layers. Nets with 1 hidden layer can be universal approximators if they have enough hidden nodes. Universal approximators tend to interpolate well at the expense of extrapolating badly, especially if they have too many hidden nodes.
2. If you look at the code in help newff and doc newff, you will see that you don't need to specify a long list of net properties. Always try the defaults first. They are usually sufficient.
3. Since the default and alternative input normalizations (mapminmax and mapstd) tend to center the data, 'tansig', NOT 'logsig' is the best choice for a MLP hidden layer transfer function.
4. Overfitting/Overtraining/Generalization
Ntrneq = prod(size(t)) =7*2 = 14 % Training Equations
Nw = (3+1)*3+(3+1)*3+(3+1)*2 = 32 % Unknown weights
Nw > Ntrneq % OVERFITTING
None of the following conditions are satisfied
Ntrneq >> Nw % Overfitting mitigation
Nval >> 1 % Overtraining mitigation via validation stopping
net.trainFcn = 'trainbr' % Overtraining mitigation via regularization
Consequently you have an over-trained over-fit net that is not expected to generalize well.
I have no idea what deterministic transformation the data is supposed to represent. Therefore, it is difficult to evaluate a single net with non-design data to see if it is any good (i.e., can generalize ).
The original data represented the 8 corners of a 3-D cube. If the target for all 8 corners is known, the generalization capabilty could be tested via Leave-one-out cross-validation where eight nets are designed with 7 corners and tested on the eighth corner.
However, if you visualize a 3-D cube, notice that any corner can be considered to be an OUTLIER with respect to the other 7. Therefore, it would not be surprising if a net designed with 7 corners could not extrapolate well to the eighth corner.
An interesting demonstration would be to vary the number of hidden nodes from H = 0 to a value BEYOND the upper bound value H=Hub, where the number of unknown weights is greater than the number of training equations.
To mitigate the existence of bad random weight configurations, design Ntrials = 10 nets for each value of H from 0 to Hmax (numH = Hmax+1). Since N=8 and H are small, the N*numH*Ntrials = 80* numH designs can probably be designed in less than 5 or 10 minutes.
[ I Ntrn ] = size(ptrn) % [ 3 7 ]
[ O Ntrn ] = size(ttrn) % [ 2 7]
Ntrneq = prod(size(ttrn)) % 14
[ I Ntst] = size(ptst) % [ 3 1 ]
[ O Ntst ] = size(ttst) % [ 2 1 ]
% Nw = (I+1)*H+(H+1)*O = O +(I+O+1)*H
Hub = -1 + ceil( (Ntrneq-O) / (I+O+1)) % 1
Hmin = 0, dH = 1, Hmax = 3 % Choose numH = 4, Ndesigns = 320
More Later

azie
azie on 10 Jul 2013
still searching for an answer. accepting your code and run it. however, it seem like 1- the epoch usually runs not more than 30epochs.is it okay? the performance goal met and sometime exceedthe Mu. is this will produce good result later on?
2-the prediction value is far from target with large error founded.
so,any suggestion?
  1 Comment
Greg Heath
Greg Heath on 22 Jul 2013
Your problem is not suitable as a regression or a classification problem where a model designed with a subset of the data can generalize to the rest of the data.
All you have to do is visualize the cube in 3 dimensions. None of the points are characterized by the other 7 points.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!