Main Content

fitcnet

Train neural network classification model

    Description

    Use fitcnet to train a feedforward, fully connected neural network for classification. The first fully connected layer of the neural network has a connection from the network input (predictor data), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix and then adds a bias vector. An activation function follows each fully connected layer. The final fully connected layer and the subsequent softmax activation function produce the network's output, namely classification scores (posterior probabilities) and predicted labels. For more information, see Neural Network Structure.

    example

    Mdl = fitcnet(Tbl,ResponseVarName) returns a neural network classification model Mdl trained using the predictors in the table Tbl and the class labels in the ResponseVarName table variable.

    Mdl = fitcnet(Tbl,formula) returns a neural network classification model trained using the sample data in the table Tbl. The input argument formula is an explanatory model of the response and a subset of the predictor variables in Tbl used to fit Mdl.

    Mdl = fitcnet(Tbl,Y) returns a neural network classification model using the predictor variables in the table Tbl and the class labels in vector Y.

    example

    Mdl = fitcnet(X,Y) returns a neural network classification model trained using the predictors in the matrix X and the class labels in vector Y.

    example

    Mdl = fitcnet(___,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can adjust the number of outputs and the activation functions for the fully connected layers by specifying the LayerSizes and Activations name-value arguments.

    Examples

    collapse all

    Train a neural network classifier, and assess the performance of the classifier on a test set.

    Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.

    creditrating = readtable("CreditRating_Historical.dat");
    head(creditrating)
    ans=8×8 table
         ID      WC_TA     RE_TA     EBIT_TA    MVE_BVTD    S_TA     Industry    Rating 
        _____    ______    ______    _______    ________    _____    ________    _______
    
        62394     0.013     0.104     0.036      0.447      0.142        3       {'BB' }
        48608     0.232     0.335     0.062      1.969      0.281        8       {'A'  }
        42444     0.311     0.367     0.074      1.935      0.366        1       {'A'  }
        48631     0.194     0.263     0.062      1.017      0.228        4       {'BBB'}
        43768     0.121     0.413     0.057      3.647      0.466       12       {'AAA'}
        39255    -0.117    -0.799      0.01      0.179      0.082        4       {'CCC'}
        62236     0.087     0.158     0.049      0.816      0.324        2       {'BBB'}
        39354     0.005     0.181     0.034      2.597      0.388        7       {'AA' }
    
    

    Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable.

    creditrating = removevars(creditrating,"ID");
    creditrating.Industry = categorical(creditrating.Industry);

    Convert the Rating response variable to an ordinal categorical variable.

    creditrating.Rating = categorical(creditrating.Rating, ...
        ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

    Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data.

    rng("default") % For reproducibility of the partition
    c = cvpartition(creditrating.Rating,"Holdout",0.20);
    trainingIndices = training(c); % Indices for the training set
    testIndices = test(c); % Indices for the test set
    creditTrain = creditrating(trainingIndices,:);
    creditTest = creditrating(testIndices,:);

    Train a neural network classifier by passing the training data creditTrain to the fitcnet function.

    Mdl = fitcnet(creditTrain,"Rating")
    Mdl = 
      ClassificationNeuralNetwork
               PredictorNames: {'WC_TA'  'RE_TA'  'EBIT_TA'  'MVE_BVTD'  'S_TA'  'Industry'}
                 ResponseName: 'Rating'
        CategoricalPredictors: 6
                   ClassNames: [AAA    AA    A    BBB    BB    B    CCC]
               ScoreTransform: 'none'
              NumObservations: 3146
                   LayerSizes: 10
                  Activations: 'relu'
        OutputLayerActivation: 'softmax'
                       Solver: 'LBFGS'
              ConvergenceInfo: [1×1 struct]
              TrainingHistory: [1000×7 table]
    
    
      Properties, Methods
    
    

    Mdl is a trained ClassificationNeuralNetwork classifier. You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.TrainingHistory to get more information about the training history of the neural network model.

    Evaluate the performance of the classifier on the test set by computing the test set classification error. Visualize the results by using a confusion matrix.

    testAccuracy = 1 - loss(Mdl,creditTest,"Rating", ...
        "LossFun","classiferror")
    testAccuracy = 0.8003
    
    confusionchart(creditTest.Rating,predict(Mdl,creditTest))

    Specify the structure of a neural network classifier, including the size of the fully connected layers.

    Load the ionosphere data set, which includes radar signal data. X contains the predictor data, and Y is the response variable, whose values represent either good ("g") or bad ("b") radar signals.

    load ionosphere

    Separate the data into training data (XTrain and YTrain) and test data (XTest and YTest) by using a stratified holdout partition. Reserve approximately 30% of the observations for testing, and use the rest of the observations for training.

    rng("default") % For reproducibility of the partition
    cvp = cvpartition(Y,"Holdout",0.3);
    XTrain = X(training(cvp),:);
    YTrain = Y(training(cvp));
    XTest = X(test(cvp),:);
    YTest = Y(test(cvp));

    Train a neural network classifier. Specify to have 35 outputs in the first fully connected layer and 20 outputs in the second fully connected layer. By default, both layers use a rectified linear unit (ReLU) activation function. You can change the activation functions for the fully connected layers by using the Activations name-value argument.

    Mdl = fitcnet(XTrain,YTrain, ...
        "LayerSizes",[35 20])
    Mdl = 
      ClassificationNeuralNetwork
                 ResponseName: 'Y'
        CategoricalPredictors: []
                   ClassNames: {'b'  'g'}
               ScoreTransform: 'none'
              NumObservations: 246
                   LayerSizes: [35 20]
                  Activations: 'relu'
        OutputLayerActivation: 'softmax'
                       Solver: 'LBFGS'
              ConvergenceInfo: [1×1 struct]
              TrainingHistory: [47×7 table]
    
    
      Properties, Methods
    
    

    Access the weights and biases for the fully connected layers of the trained classifier by using the LayerWeights and LayerBiases properties of Mdl. The first two elements of each property correspond to the values for the first two fully connected layers, and the third element corresponds to the values for the final fully connected layer with a softmax activation function for classification. For example, display the weights and biases for the second fully connected layer.

    Mdl.LayerWeights{2}
    ans = 20×35
    
        0.0481    0.2501   -0.1535   -0.0934    0.0760   -0.0579   -0.2465    1.0411    0.3712   -1.2007    1.1162    0.4296    0.4045    0.5005    0.8839    0.4624   -0.3154    0.3454   -0.0487    0.2648    0.0732    0.5773    0.4286    0.0881    0.9468    0.2981    0.5534    1.0518   -0.0224    0.6894    0.5527    0.7045   -0.6124    0.2145   -0.0790
       -0.9489   -1.8343    0.5510   -0.5751   -0.8726    0.8815    0.0203   -1.6379    2.0315    1.7599   -1.4153   -1.4335   -1.1638   -0.1715    1.1439   -0.7661    1.1230   -1.1982   -0.5409   -0.5821   -0.0627   -0.7038   -0.0817   -1.5773   -1.4671    0.2053   -0.7931   -1.6201   -0.1737   -0.7762   -0.3063   -0.8771    1.5134   -0.4611   -0.0649
       -0.1910    0.0246   -0.3511    0.0097    0.3160   -0.0693    0.2270   -0.0783   -0.1626   -0.3478    0.2765    0.4179    0.0727   -0.0314   -0.1798   -0.0583    0.1375   -0.1876    0.2518    0.2137    0.1497    0.0395    0.2859   -0.0905    0.4325   -0.2012    0.0388   -0.1441   -0.1431   -0.0249   -0.2200    0.0860   -0.2076    0.0132    0.1737
       -0.0415   -0.0059   -0.0753   -0.1477   -0.1621   -0.1762    0.2164    0.1710   -0.0610   -0.1402    0.1452    0.2890    0.2872   -0.2616   -0.4204   -0.2831   -0.1901    0.0036    0.0781   -0.0826    0.1588   -0.2782    0.2510   -0.1069   -0.2692    0.2306    0.2521    0.0306    0.2524   -0.4218    0.2478    0.2343   -0.1031    0.1037    0.1598
        1.1848    1.6142   -0.1352    0.5774    0.5491    0.0103    0.0209    0.7219   -0.8643   -0.5578    1.3595    1.5385    1.0015    0.7416   -0.4342    0.2279    0.5667    1.1589    0.7100    0.1823    0.4171    0.7051    0.0794    1.3267    1.2659    0.3197    0.3947    0.3436   -0.1415    0.6607    1.0071    0.7726   -0.2840    0.8801    0.0848
        0.2486   -0.2920   -0.0004    0.2806    0.2987   -0.2709    0.1473   -0.2580   -0.0499   -0.0755    0.2000    0.1535   -0.0285   -0.0520   -0.2523   -0.2505   -0.0437   -0.2323    0.2023    0.2061   -0.1365    0.0744    0.0344   -0.2891    0.2341   -0.1556    0.1459    0.2533   -0.0583    0.0243   -0.2949   -0.1530    0.1546   -0.0340   -0.1562
       -0.0516    0.0640    0.1824   -0.0675   -0.2065   -0.0052   -0.1682   -0.1520    0.0060    0.0450    0.0813   -0.0234    0.0657    0.3219   -0.1871    0.0658   -0.2103    0.0060   -0.2831   -0.1811   -0.0988    0.2378   -0.0761    0.1714   -0.1596   -0.0011    0.0609    0.4003    0.3687   -0.2879    0.0910    0.0604   -0.2222   -0.2735   -0.1155
       -0.6192   -0.7804   -0.0506   -0.4205   -0.2584   -0.2020   -0.0008    0.0534    1.0185   -0.0307   -0.0539   -0.2020    0.0368   -0.1847    0.0886   -0.4086   -0.4648   -0.3785    0.1542   -0.5176   -0.3207    0.1893   -0.0313   -0.5297   -0.1261   -0.2749   -0.6152   -0.5914   -0.3089    0.2432   -0.3955   -0.1711    0.1710   -0.4477    0.0718
        0.5049   -0.1362   -0.2218    0.1637   -0.1282   -0.1008    0.1445    0.4527   -0.4887    0.0503    0.1453    0.1316   -0.3311   -0.1081   -0.7699    0.4062   -0.1105   -0.0855    0.0630   -0.1469   -0.2533    0.3976    0.0418    0.5294    0.3982    0.1027   -0.0973   -0.1282    0.2491    0.0425    0.0533    0.1578   -0.8403   -0.0535   -0.0048
        1.1109   -0.0466    0.4044    0.6366    0.1863    0.5660    0.2839    0.8793   -0.5497    0.0057    0.3468    0.0980    0.3364    0.4669    0.1466    0.7883   -0.1743    0.4444    0.4535    0.1521    0.7476    0.2246    0.4473    0.2829    0.8881    0.4666    0.6334    0.3105    0.9571    0.2808    0.6483    0.1180   -0.4558    1.2486    0.2453
          ⋮
    
    
    Mdl.LayerBiases{2}
    ans = 20×1
    
        0.6147
        0.1891
       -0.2767
       -0.2977
        1.3655
        0.0347
        0.1509
       -0.4839
       -0.3960
        0.9248
          ⋮
    
    

    The final fully connected layer has two outputs, one for each class in the response variable. The number of layer outputs corresponds to the first dimension of the layer weights and layer biases.

    size(Mdl.LayerWeights{end})
    ans = 1×2
    
         2    20
    
    
    size(Mdl.LayerBiases{end})
    ans = 1×2
    
         2     1
    
    

    To estimate the performance of the trained classifier, compute the test set classification error for Mdl.

    testError = loss(Mdl,XTest,YTest, ...
        "LossFun","classiferror")
    testError = 0.0774
    
    accuracy = 1 - testError
    accuracy = 0.9226
    

    Mdl accurately classifies approximately 92% of the observations in the test set.

    At each iteration of the training process, compute the validation loss of the neural network. Stop the training process early if the validation loss reaches a reasonable minimum.

    Load the patients data set. Create a table from the data set. Each row corresponds to one patient, and each column corresponds to a diagnostic variable. Use the Smoker variable as the response variable, and the rest of the variables as predictors.

    load patients
    tbl = table(Diastolic,Systolic,Gender,Height,Weight,Age,Smoker);

    Separate the data into a training set tblTrain and a validation set tblValidation by using a stratified holdout partition. The software reserves approximately 30% of the observations for the validation data set and uses the rest of the observations for the training data set.

    rng("default") % For reproducibility of the partition
    c = cvpartition(tbl.Smoker,"Holdout",0.30);
    trainingIndices = training(c);
    validationIndices = test(c);
    tblTrain = tbl(trainingIndices,:);
    tblValidation = tbl(validationIndices,:);

    Train a neural network classifier by using the training set. Specify the Smoker column of tblTrain as the response variable. Evaluate the model at each iteration by using the validation set. Specify to display the training information at each iteration by using the Verbose name-value argument. By default, the training process ends early if the validation cross-entropy loss is greater than or equal to the minimum validation cross-entropy loss computed so far, six times in a row. To change the number of times the validation loss is allowed to be greater than or equal to the minimum, specify the ValidationPatience name-value argument.

    Mdl = fitcnet(tblTrain,"Smoker", ...
        "ValidationData",tblValidation, ...
        "Verbose",1);
    |==========================================================================================|
    | Iteration  | Train Loss | Gradient   | Step       | Iteration  | Validation | Validation |
    |            |            |            |            | Time (sec) | Loss       | Checks     |
    |==========================================================================================|
    |           1|    2.602935|   26.866935|    0.262009|    0.001800|    2.793048|           0|
    |           2|    1.470816|   42.594723|    0.058323|    0.001460|    1.247046|           0|
    |           3|    1.299292|   25.854432|    0.034910|    0.000456|    1.507857|           1|
    |           4|    0.710465|   11.629107|    0.013616|    0.000617|    0.889157|           0|
    |           5|    0.647783|    2.561740|    0.005753|    0.000957|    0.766728|           0|
    |           6|    0.645541|    0.681579|    0.001000|    0.000706|    0.776072|           1|
    |           7|    0.639611|    1.544692|    0.007013|    0.005517|    0.776320|           2|
    |           8|    0.604189|    5.045676|    0.064190|    0.000534|    0.744919|           0|
    |           9|    0.565364|    5.851552|    0.068845|    0.000504|    0.694226|           0|
    |          10|    0.391994|    8.377717|    0.560480|    0.000370|    0.425466|           0|
    |==========================================================================================|
    | Iteration  | Train Loss | Gradient   | Step       | Iteration  | Validation | Validation |
    |            |            |            |            | Time (sec) | Loss       | Checks     |
    |==========================================================================================|
    |          11|    0.383843|    0.630246|    0.110270|    0.000749|    0.428487|           1|
    |          12|    0.369289|    2.404750|    0.084395|    0.000531|    0.405728|           0|
    |          13|    0.357839|    6.220679|    0.199197|    0.000353|    0.378480|           0|
    |          14|    0.344974|    2.752717|    0.029013|    0.000330|    0.367279|           0|
    |          15|    0.333747|    0.711398|    0.074513|    0.000328|    0.348499|           0|
    |          16|    0.327763|    0.804818|    0.122178|    0.000348|    0.330237|           0|
    |          17|    0.327702|    0.778169|    0.009810|    0.000365|    0.329095|           0|
    |          18|    0.327277|    0.020615|    0.004377|    0.000380|    0.329141|           1|
    |          19|    0.327273|    0.010018|    0.003313|    0.000432|    0.328773|           0|
    |          20|    0.327268|    0.019497|    0.000805|    0.000776|    0.328831|           1|
    |==========================================================================================|
    | Iteration  | Train Loss | Gradient   | Step       | Iteration  | Validation | Validation |
    |            |            |            |            | Time (sec) | Loss       | Checks     |
    |==========================================================================================|
    |          21|    0.327228|    0.113983|    0.005397|    0.000509|    0.329085|           2|
    |          22|    0.327138|    0.240166|    0.012159|    0.000333|    0.329406|           3|
    |          23|    0.326865|    0.428912|    0.036841|    0.000381|    0.329952|           4|
    |          24|    0.325797|    0.255227|    0.139585|    0.000339|    0.331246|           5|
    |          25|    0.325181|    0.758050|    0.135868|    0.000890|    0.332035|           6|
    |==========================================================================================|
    

    Create a plot that compares the training cross-entropy loss and the validation cross-entropy loss at each iteration. By default, fitcnet stores the loss information inside the TrainingHistory property of the object Mdl. You can access this information by using dot notation.

    iteration = Mdl.TrainingHistory.Iteration;
    trainLosses = Mdl.TrainingHistory.TrainingLoss;
    valLosses = Mdl.TrainingHistory.ValidationLoss;
    
    plot(iteration,trainLosses,iteration,valLosses)
    legend(["Training","Validation"])
    xlabel("Iteration")
    ylabel("Cross-Entropy Loss")

    Check the iteration that corresponds to the minimum validation loss. The final returned model Mdl is the model trained at this iteration.

    [~,minIdx] = min(valLosses);
    iteration(minIdx)
    ans = 19
    

    Assess the cross-validation loss of neural network models with different regularization strengths, and choose the regularization strength corresponding to the best performing model.

    Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.

    creditrating = readtable("CreditRating_Historical.dat");
    head(creditrating)
    ans=8×8 table
         ID      WC_TA     RE_TA     EBIT_TA    MVE_BVTD    S_TA     Industry    Rating 
        _____    ______    ______    _______    ________    _____    ________    _______
    
        62394     0.013     0.104     0.036      0.447      0.142        3       {'BB' }
        48608     0.232     0.335     0.062      1.969      0.281        8       {'A'  }
        42444     0.311     0.367     0.074      1.935      0.366        1       {'A'  }
        48631     0.194     0.263     0.062      1.017      0.228        4       {'BBB'}
        43768     0.121     0.413     0.057      3.647      0.466       12       {'AAA'}
        39255    -0.117    -0.799      0.01      0.179      0.082        4       {'CCC'}
        62236     0.087     0.158     0.049      0.816      0.324        2       {'BBB'}
        39354     0.005     0.181     0.034      2.597      0.388        7       {'AA' }
    
    

    Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable.

    creditrating = removevars(creditrating,"ID");
    creditrating.Industry = categorical(creditrating.Industry);

    Convert the Rating response variable to an ordinal categorical variable.

    creditrating.Rating = categorical(creditrating.Rating, ...
        ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

    Create a cvpartition object for stratified 5-fold cross-validation. cvp partitions the data into five folds, where each fold has roughly the same proportions of different credit ratings. Set the random seed to the default value for reproducibility of the partition.

    rng("default")
    cvp = cvpartition(creditrating.Rating,"KFold",5);

    Compute the cross-validation classification error for neural network classifiers with different regularization strengths. Try regularization strengths on the order of 1/n, where n is the number of observations. Specify to standardize the data before training the neural network models.

    1/size(creditrating,1)
    ans = 2.5432e-04
    
    lambda = (0:0.5:5)*1e-4;
    cvloss = zeros(length(lambda),1);
    
    for i = 1:length(lambda)
        cvMdl = fitcnet(creditrating,"Rating","Lambda",lambda(i), ...
            "CVPartition",cvp,"Standardize",true);
        cvloss(i) = kfoldLoss(cvMdl,"LossFun","classiferror");
    end

    Plot the results. Find the regularization strength corresponding to the lowest cross-validation classification error.

    plot(lambda,cvloss)
    xlabel("Regularization Strength")
    ylabel("Cross-Validation Loss")

    [~,idx] = min(cvloss);
    bestLambda = lambda(idx)
    bestLambda = 5.0000e-05
    

    Train a neural network classifier using the bestLambda regularization strength.

    Mdl = fitcnet(creditrating,"Rating","Lambda",bestLambda, ...
        "Standardize",true)
    Mdl = 
      ClassificationNeuralNetwork
               PredictorNames: {'WC_TA'  'RE_TA'  'EBIT_TA'  'MVE_BVTD'  'S_TA'  'Industry'}
                 ResponseName: 'Rating'
        CategoricalPredictors: 6
                   ClassNames: [AAA    AA    A    BBB    BB    B    CCC]
               ScoreTransform: 'none'
              NumObservations: 3932
                   LayerSizes: 10
                  Activations: 'relu'
        OutputLayerActivation: 'softmax'
                       Solver: 'LBFGS'
              ConvergenceInfo: [1×1 struct]
              TrainingHistory: [1000×7 table]
    
    
      Properties, Methods
    
    

    Input Arguments

    collapse all

    Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

    • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName.

    • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula.

    • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal.

    Data Types: table

    Response variable name, specified as the name of a variable in Tbl.

    You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model.

    The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array.

    A good practice is to specify the order of the classes by using the ClassNames name-value argument.

    Data Types: char | string

    Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y~x1+x2+x3'. In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables.

    To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula.

    The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB® identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

    Data Types: char | string

    Class labels used to train the model, specified as a numeric, categorical, or logical vector; a character or string array; or a cell array of character vectors.

    • If Y is a character array, then each element of the class labels must correspond to one row of the array.

    • The length of Y must be equal to the number of rows in Tbl or X.

    • A good practice is to specify the class order by using the ClassNames name-value argument.

    Data Types: single | double | categorical | logical | char | string | cell

    Predictor data used to train the model, specified as a numeric matrix.

    By default, the software treats each row of X as one observation, and each column as one predictor.

    The length of Y and the number of observations in X must be equal.

    To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value argument.

    Note

    If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time.

    Data Types: single | double

    Note

    The software treats NaN, empty character vector (''), empty string (""), <missing>, and <undefined> elements as missing values, and removes observations with any of these characteristics:

    • Missing value in the response variable (for example, Y or ValidationData{2})

    • At least one missing value in a predictor observation (for example, row in X or ValidationData{1})

    • NaN value or 0 weight (for example, value in Weights or ValidationData{3})

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: fitcnet(X,Y,'LayerSizes',[10 10],'Activations',["relu","tanh"]) specifies to create a neural network with two fully connected layers, each with 10 outputs. The first layer uses a rectified linear unit (ReLU) activation function, and the second uses a hyperbolic tangent activation function.
    Neural Network Options

    collapse all

    Sizes of the fully connected layers in the neural network model, specified as a positive integer vector. The ith element of LayerSizes is the number of outputs in the ith fully connected layer of the neural network model.

    LayerSizes does not include the size of the final fully connected layer that uses a softmax activation function. For more information, see Neural Network Structure.

    Example: 'LayerSizes',[100 25 10]

    Activation functions for the fully connected layers of the neural network model, specified as a character vector, string scalar, string array, or cell array of character vectors with values from this table.

    ValueDescription
    'relu'

    Rectified linear unit (ReLU) function — Performs a threshold operation on each element of the input, where any value less than zero is set to zero, that is,

    f(x)={x,x00,x<0

    'tanh'

    Hyperbolic tangent (tanh) function — Applies the tanh function to each input element

    'sigmoid'

    Sigmoid function — Performs the following operation on each input element:

    f(x)=11+ex

    'none'

    Identity function — Returns each input element without performing any transformation, that is, f(x) = x

    • If you specify one activation function only, then Activations is the activation function for every fully connected layer of the neural network model, excluding the final fully connected layer. The activation function for the final fully connected layer is always softmax (see Neural Network Structure).

    • If you specify an array of activation functions, then the ith element of Activations is the activation function for the ith layer of the neural network model.

    Example: 'Activations','sigmoid'

    Function to initialize the fully connected layer weights, specified as 'glorot' or 'he'.

    ValueDescription
    'glorot'Initialize the weights with the Glorot initializer [1] (also known as the Xavier initializer). For each layer, the Glorot initalizer independently samples from a uniform distribution with zero mean and variable 2/(I+O), where I is the input size and O is the output size for the layer.
    'he'Initialize the weights with the He initializer [2]. For each layer, the He initializer samples from a normal distribution with zero mean and variance 2/I, where I is the input size for the layer.

    Example: 'LayerWeightsFunction','he'

    Type of initial fully connected layer biases, specified as 'zeros' or 'ones'.

    • If you specify the value 'zeros', then each fully connected layer has an initial bias of 0.

    • If you specify the value 'ones', then each fully connected layer has an initial bias of 1.

    Example: 'LayerBiasesInitializer','ones'

    Data Types: char | string

    Predictor data observation dimension, specified as 'rows' or 'columns'.

    Note

    If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time. You cannot specify 'ObservationsIn','columns' for predictor data in a table.

    Example: 'ObservationsIn','columns'

    Data Types: char | string

    Regularization term strength, specified as a nonnegative scalar. The software composes the objective function for minimization from the cross-entropy loss function and the ridge (L2) penalty term.

    Example: 'Lambda',1e-4

    Data Types: single | double

    Flag to standardize the predictor data, specified as a numeric or logical 0 (false) or 1 (true). If you set Standardize to true, then the software centers and scales each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors.

    Example: 'Standardize',true

    Data Types: single | double | logical

    Convergence Control Options

    collapse all

    Verbosity level, specified as 0 or 1. The 'Verbose' name-value argument controls the amount of diagnostic information that fitcnet displays at the command line.

    ValueDescription
    0fitcnet does not display diagnostic information.
    1fitcnet periodically displays diagnostic information.

    By default, StoreHistory is set to true and fitcnet stores the diagnostic information inside of Mdl. Use Mdl.TrainingHistory to access the diagnostic information.

    Example: 'Verbose',1

    Data Types: single | double

    Frequency of verbose printing, which is the number of iterations between printing to the command window, specified as a positive integer scalar. A value of 1 indicates to print diagnostic information at every iteration.

    Note

    To use this name-value argument, set Verbose to 1.

    Example: 'VerboseFrequency',5

    Data Types: single | double

    Flag to store the training history, specified as a numeric or logical 0 (false) or 1 (true). If StoreHistory is set to true, then the software stores diagnostic information inside of Mdl, which you can access by using Mdl.TrainingHistory.

    Example: 'StoreHistory',false

    Data Types: single | double | logical

    Maximum number of training iterations, specified as a positive integer scalar.

    The software returns a trained model regardless of whether the training routine successfully converges. Mdl.ConvergenceInfo contains convergence information.

    Example: 'IterationLimit',1e8

    Data Types: single | double

    Relative gradient tolerance, specified as a nonnegative scalar.

    Let t be the loss function at training iteration t, t be the gradient of the loss function with respect to the weights and biases at iteration t, and 0 be the gradient of the loss function at an initial point. If max|t|aGradientTolerance, where a=max(1,min|t|,max|0|), then the training process terminates.

    Example: 'GradientTolerance',1e-5

    Data Types: single | double

    Loss tolerance, specified as a nonnegative scalar.

    If the function loss at some iteration is smaller than LossTolerance, then the training process terminates.

    Example: 'LossTolerance',1e-8

    Data Types: single | double

    Step size tolerance, specified as a nonnegative scalar.

    If the step size at some iteration is smaller than StepTolerance, then the training process terminates.

    Example: 'StepTolerance',1e-4

    Data Types: single | double

    Validation data for training convergence detection, specified as a cell array or table.

    During the training process, the software periodically estimates the validation loss by using ValidationData. If the validation loss increases more than ValidationPatience times in a row, then the software terminates the training.

    You can specify ValidationData as a table if you use a table Tbl of predictor data that contains the response variable. In this case, ValidationData must contain the same predictors and response contained in Tbl. The software does not apply weights to observations, even if Tbl contains a vector of weights. To specify weights, you must specify ValidationData as a cell array.

    If you specify ValidationData as a cell array, then it must have the following format:

    • ValidationData{1} must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix X, then ValidationData{1} must be an m-by-p or p-by-m matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. Similarly, if you use a predictor table Tbl of predictor data, then ValidationData{1} must be a table containing the same predictor variables contained in Tbl. The number of observations in ValidationData{1} and the predictor data can vary.

    • ValidationData{2} must match the data type and format of the response variable, either Y or ResponseVarName. If ValidationData{2} is an array of class labels, then it must have the same number of elements as the number of observations in ValidationData{1}. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. If ValidationData{1} is a table, then ValidationData{2} can be the name of the response variable in the table. If you want to use the same ResponseVarName or formula, you can specify ValidationData{2} as [].

    • Optionally, you can specify ValidationData{3} as an m-dimensional numeric vector of observation weights or the name of a variable in the table ValidationData{1} that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1.

    If you specify ValidationData and want to display the validation loss at the command line, set Verbose to 1.

    Number of iterations between validation evaluations, specified as a positive integer scalar. A value of 1 indicates to evaluate validation metrics at every iteration.

    Note

    To use this name-value argument, you must specify ValidationData.

    Example: 'ValidationFrequency',5

    Data Types: single | double

    Stopping condition for validation evaluations, specified as a nonnegative integer scalar. The training process stops if the validation loss is greater than or equal to the minimum validation loss computed so far, ValidationPatience times in a row. You can check the Mdl.TrainingHistory table to see the running total of times that the validation loss is greater than or equal to the minimum (Validation Checks).

    Example: 'ValidationPatience',10

    Data Types: single | double

    Other Classification Options

    collapse all

    Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns.

    ValueDescription
    Vector of positive integers

    Each entry in the vector is an index value corresponding to the column of the predictor data that contains a categorical variable. The index values are between 1 and p, where p is the number of predictors used to train the model.

    If fitcnet uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The 'CategoricalPredictors' values do not count the response variable, the observation weight variable, and any other variables that the function does not use.

    Logical vector

    A true entry means that the corresponding column of predictor data is a categorical variable. The length of the vector is p.

    Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.
    String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.
    'all'All predictors are categorical.

    By default, if the predictor data is in a table (Tbl), fitcnet assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitcnet assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value argument.

    For the identified categorical predictors, fitcnet creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. For an unordered categorical variable, fitcnet creates one dummy variable for each level of the categorical variable. For an ordered categorical variable, fitcnet creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

    Example: 'CategoricalPredictors','all'

    Data Types: single | double | logical | char | string | cell

    Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y.

    If ClassNames is a character array, then each element must correspond to one row of the array.

    Use ClassNames to:

    • Specify the order of the classes during training.

    • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict.

    • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}.

    The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y.

    Example: 'ClassNames',{'b','g'}

    Data Types: categorical | char | string | logical | single | double | cell

    Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data.

    • If you supply X and Y, then you can use 'PredictorNames' to assign names to the predictor variables in X.

      • The order of the names in PredictorNames must correspond to the predictor order in X. Assuming that X has the default orientation, with observations in rows and predictors in columns, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.

      • By default, PredictorNames is {'x1','x2',...}.

    • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcnet uses only the predictor variables in PredictorNames and the response variable during training.

      • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable.

      • By default, PredictorNames contains the names of all predictor variables.

      • A good practice is to specify the predictors for training using either 'PredictorNames' or formula, but not both.

    Example: 'PredictorNames',{'SepalLength','SepalWidth','PetalLength','PetalWidth'}

    Data Types: string | cell

    Response variable name, specified as a character vector or string scalar.

    • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable.

    • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'.

    Example: 'ResponseName','response'

    Data Types: char | string

    Score transformation, specified as a character vector, string scalar, or function handle.

    This table summarizes the available character vectors and string scalars.

    ValueDescription
    'doublelogit'1/(1 + e–2x)
    'invlogit'log(x / (1 – x))
    'ismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
    'logit'1/(1 + ex)
    'none' or 'identity'x (no transformation)
    'sign'–1 for x < 0
    0 for x = 0
    1 for x > 0
    'symmetric'2x – 1
    'symmetricismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
    'symmetriclogit'2/(1 + ex) – 1

    For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

    Example: 'ScoreTransform','logit'

    Data Types: char | string | function_handle

    Observation weights, specified as a nonnegative numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of observations in X or Tbl.

    If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response variable when training the model.

    By default, Weights is ones(n,1), where n is the number of observations in X or Tbl.

    The software normalizes Weights to sum to the value of the prior probability in the respective class.

    Data Types: single | double | char | string

    Cross-Validation Options

    collapse all

    Flag to train a cross-validated classifier, specified as 'on' or 'off'.

    If you specify 'on', then the software trains a cross-validated classifier with 10 folds.

    You can override this cross-validation setting using the CVPartition, Holdout, KFold, or Leaveout name-value argument. You can use only one cross-validation name-value argument at a time to create a cross-validated model.

    Alternatively, cross-validate later by passing Mdl to crossval.

    Example: 'Crossval','on'

    Data Types: char | string

    Cross-validation partition, specified as a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

    To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

    Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp.

    Fraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps:

    1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

    2. Store the compact, trained model in the Trained property of the cross-validated model.

    To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

    Example: 'Holdout',0.1

    Data Types: double | single

    Number of folds to use in a cross-validated model, specified as a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps:

    1. Randomly partition the data into k sets.

    2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

    3. Store the k compact, trained models in a k-by-1 cell vector in the Trained property of the cross-validated model.

    To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

    Example: 'KFold',5

    Data Types: single | double

    Leave-one-out cross-validation flag, specified as 'on' or 'off'. If you specify 'Leaveout','on', then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

    1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

    2. Store the n compact, trained models in an n-by-1 cell vector in the Trained property of the cross-validated model.

    To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

    Example: 'Leaveout','on'

    Output Arguments

    collapse all

    Trained neural network classifier, returned as a ClassificationNeuralNetwork or ClassificationPartitionedModel object.

    If you set any of the name-value arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout, then Mdl is a ClassificationPartitionedModel object. Otherwise, Mdl is a ClassificationNeuralNetwork model.

    To reference properties of Mdl, use dot notation.

    More About

    collapse all

    Neural Network Structure

    The default neural network classifier has the following layer structure.

    StructureDescription

    Default neural network classifier structure, with one customizable fully connected layer with a ReLU activation

    Input — This layer corresponds to the predictor data in Tbl or X.

    First fully connected layer — This layer has 10 outputs by default.

    • You can widen the layer or add more fully connected layers to the network by specifying the LayerSizes name-value argument.

    • You can find the weights and biases for this layer in the Mdl.LayerWeights{1} and Mdl.LayerBiases{1} properties of Mdl, respectively.

    ReLU activation function — fitcnet applies this activation function to the first fully connected layer.

    • You can change the activation function by specifying the Activations name-value argument.

    Final fully connected layer — This layer has K outputs, where K is the number of classes in the response variable.

    • You can find the weights and biases for this layer in the Mdl.LayerWeights{end} and Mdl.LayerBiases{end} properties of Mdl, respectively.

    Softmax function (for both binary and multiclass classification) — fitcnet applies this activation function to the final fully connected layer. The function takes each input xi and returns the following, where K is the number of classes in the response variable:

    f(xi)=exp(xi)j=1Kexp(xj).

    The results correspond to the predicted classification scores (or posterior probabilities).

    Output — This layer corresponds to the predicted class labels.

    For an example that shows how a neural network classifier with this layer structure returns predictions, see Predict Using Layer Structure of Neural Network Classifier.

    Tips

    • Always try to standardize the numeric predictors (see Standardize). Standardization makes predictors insensitive to the scales on which they are measured.

    Algorithms

    collapse all

    Training Solver

    fitcnet uses a limited-memory Broyden-Flecter-Goldfarb-Shanno quasi-Newton algorithm (LBFGS) [3] as its loss function minimization technique, where the software minimizes the cross-entropy loss.

    References

    [1] Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. 2010.

    [2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. 2015.

    [3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006.

    Introduced in R2021a