# The best Ann configuration

13 views (last 30 days)
Rita on 9 Feb 2016
Commented: Greg Heath on 14 Feb 2016
I have run Ann for prediction with hidden nodes from 2-17 about 50 times. My question is which criteria I should rely on to select the best Ann? Should I choose R squerd of Test or Mse of the Ann or validation performance?

Greg Heath on 11 Feb 2016
My favorite technique:
1. Accept all default parameters except for the number of hidden nodes, H.
2. Minimize H subject to the constraint that the degree-of-freedom adjusted
mean-square-error of the training data is less than 1% of the average training target variance.
3. Design and test Ntrials >= 10 nets for each value of H in a range less than the upper
bound Hub (determined by not having more unknown weights Nw than training equations Ntrneq).
The untrained nets only differ by the random trn/val/tst data division AND the random initial weights.
4. Rank the nets via their slightly biased performance on validation data.
5. Obtain unbiased performance estimates on the nets using test data.
6. Statistically significant differences in performance can be estimated using the standard
deviation of the performance estimates.
I have posted zillions of examples in both the NEWSGROUP and ANSWERS using the same notation. Therefore searching with
greg fitnet Ntrials
should dig up enough references to clarify what I have written. If not just post a comment.
Hope this helps.
Thank you for formally accepting my answer
Greg

#### 1 Comment

Greg Heath on 14 Feb 2016
% 1.By saying "divide and conquer" you mean I don't need to train all 1525 data at the same time and I need to create subsets of data?
No. You have determined Hub = 106. It is ridiculous to search using H = 1:106, Ntrials = 1000. It is better to design no more than ~100 nets at a time. For example, start with h = 6:10:106, Ntrials = 10 and print NMSE in a 10 x 11 matrix to see how few hidden nodes are needed obtain NMSE <= 0.01. Say it is 46. Then search h = 37:45, Ntrials = 10.
Just think how much more sleep you can get by designing 118 nets instead of 106,000!
% My goal is to investigate the ability of ANN to predict the missing values of emission gas. I used 6 years of daily variables (6 years =2191 data but 666 of data did not measure in the field so I had 1525 data and 666 as the real gaps)
How long were the gaps?
% also 8 variables as an input layer and one variable as an output layer (which is emission gas). One important thing about my data is inherent variability of gas emission (in a year there are some events which emission gas is so high) and makes it difficult to gap fill by usual methods.
% To compare the performance of ANN to other methods such as the linear method,
? the ANN with H =0 is a linear model
% I create some artificial gaps scenarios( with different gap length)
Please be less vague in explaining " an artificial gap scenario". How many days
min, median, mean, std and max?
You know the 8 inputs but not the output for how long? OR you don't know the inputs either ???
% into emission gas data and run ANN for each artificial gap scenarios. for each artificial gap scenario, I run different networks and selected the best network based on lower RMSE and higher coefficient of determination(R2) between measured data(artificial gaps) and calculated by ANN. Therefore, hidden neurons and Ntrial optimized based on getting the net which had higher R2 and lower RMSE. I tried dividrand and dividind
Spelling: you forgot the e.
% to divide data to tr/val/test sets. I used (4 years for training /one year validation and one year for test)for dividind. Also I used this (70/15/15 )for dividrand. With using Artificial gap scenarios, dividind indicated the good performance(low RMSE and High Rsqured) comparing to dividrand. For filling the real gap data(666 data )since I did not have the real measured values of data to calculate RMSE and other statistics metrics I used NMSE and validation performance to compare.Therefore,dividrand comparing to dividind showed better performance(low NMSE and low validation performance).
Not clear.
% 2- for the real gap values I am trying to optimize H by trial and error and I took your advice. Here are the results of run ANN for H=10:10:100 and Ntrial=10
%H/NMSE/R test/Rsquared/valperformance/Performance 70 0.31 0.65 0.69 106.32 79.90 20 0.33 0.61 0.67 126.13 86.31
I don't understand the last two columns. I would just use a 10x10 matrix of NMSE.
%what is the next step to get the optimal net based on H?
See my above comments regarding finding the smallest satisfactory H. Then you could train a net with all of the data..
% 2- should I keep using "dividind" or switch to "dividrand"? "dividind" for the artificial gap scenarios works well. "dividind" shows better performance for real gaps.
I don't know. It is not clear to me what you did.
Greg

Walter Roberson on 9 Feb 2016
What is the best way of getting around "inner London" (United Kingdom)? Is it taxi, personal automobile, Tube (subway) -- or even bicycle (ha ha ha)?
Did you decide yet? So did you measure "best" by convenience, cost, health benefits, or speed? Or did you construct a carefully weighted measure of all of those, such as being willing to trade 1 minute longer travel time for each 2000 Calories of fat burned? Or should it be 1.5 minutes and 1000 Calories? What scientific study did you use to decide the trade-offs?
Oh, by the way: multiple tests have shown that during a typical work-day afternoon, the lowest cost way of getting around inner London is by bicycle, but the healthiest way of getting around inner London is instead by bicycle; and on the third hand, the fastest way of getting around inner London is... by bicycle.
So when you are selecting an ANN, what do you mean by "best" ?

Rita on 9 Feb 2016
Thanks Walter. Actually I want to use the best ANN to predict and fill the missing values.and also with using the best ANN to examin the importance of inputs variable by some methods such as stepwise and weighting methods.I have a net with hidden nodes=4 which r2 test = 0.39 and mse is highest . on the other hand I have hidden node= 17 and r2test=0.15 and lowest mse .I also have a net which r2test=0.35 and mse is average ,So which net I should use to do some analysis??
Walter Roberson on 9 Feb 2016
Did you need the lowest false-positive rate? the lowest false-negative rate? Are you predicting values or predicting class?
Rita on 9 Feb 2016
predicting values.

Greg Heath on 11 Feb 2016
Insuffient information and explanation:
size(input) ? size(target) ?
If I guess both are [ 1 N ] , Ntrn ~ 0.7*N and Hub = 114, then
(Ntrn-1)/(1+1+1) = 114
Ntrn = (3*114+1) % 343
N = Ntrn/0.7 % 490
Assuming
NMSEgoal <= 0.01 with Hub = 114 % Probably don't need 0.005
why in the world are you even considering
NMSE = 0.25 @ H = 16
and
NMSE = 0.51 @ H= 17
Puzzled,
Greg

#### 1 Comment

Greg Heath on 13 Feb 2016
Be serious:
1. Have you ever seen me use anything more than numH= numel(Hmin:dH:Hmax)~10 and Ntrials > 15 on the zillions of examples that I have posted in the NEWSGROUP and ANSWERS?
2. Have you ever heard of the saying "DIVIDE AND CONQUER"?