FeatureSelectionNCAClassification

Feature selection for classification using neighborhood component analysis (NCA)

Description

FeatureSelectionNCAClassification object contains the data, fitting information, feature weights, and other parameters of a neighborhood component analysis (NCA) model. fscnca learns the feature weights using a diagonal adaptation of NCA and returns an instance of a FeatureSelectionNCAClassification object. The function achieves feature selection by regularizing the feature weights.

Creation

Create a FeatureSelectionNCAClassification object using fscnca.

Properties

expand all

NCA Properties

`ModelParameters` — Model parameters
Read-only: structure

This property is read-only.

Model parameters used for training the model, specified as a structure.

You can access the fields of ModelParameters using dot notation.

For example, for a FeatureSelectionNCAClassification object named mdl, you can access the LossFunction value using mdl.ModelParameters.LossFunction.

Data Types: struct

`Lambda` — Regularization parameter
Read-only: scalar

This property is read-only.

Regularization parameter used for training this model, specified as a scalar. For n observations, the best Lambda value that minimizes the generalization error of the NCA model is expected to be a multiple of 1/n.

Data Types: double

`FitMethod` — Name of fitting method
Read-only: `'exact'` | `'none'` | `'average'`

This property is read-only.

Name of the fitting method used to fit this model, specified as one of the following:

'exact' — Perform fitting using all of the data.
'none' — No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call to fscnca.
'average' — Divide the data into partitions (subsets), fit each partition using the exact method, and return the average of the feature weights. You can specify the number of partitions using the NumPartitions name-value argument.

`Solver` — Name of the solver used to fit this model
Read-only: `'lbfgs'` | `'sgd'` | `'minibatch-lbfgs'`

This property is read-only.

Name of the solver used to fit this model, specified as one of the following:

'lbfgs' — Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm
'sgd' — Stochastic gradient descent (SGD) algorithm
'minibatch-lbfgs' — stochastic gradient descent with LBFGS algorithm applied to mini-batches

`GradientTolerance` — Relative convergence tolerance on gradient norm
Read-only: positive scalar

This property is read-only.

Relative convergence tolerance on the gradient norm for the 'lbfgs' and 'minibatch-lbfgs' solvers, specified as a positive scalar value.

Data Types: double

`IterationLimit` — Maximum number of iterations for optimization
Read-only: positive integer

This property is read-only.

Maximum number of iterations for optimization, specified as a positive integer value.

Data Types: double

`PassLimit` — Maximum number of passes
Read-only: positive integer

This property is read-only.

Maximum number of passes for 'sgd' and 'minibatch-lbfgs' solvers, specified as a positive integer. Every pass processes all of the observations in the data.

Data Types: double

`InitialLearningRate` — Initial learning rate
Read-only: positive real scalar

This property is read-only.

Initial learning rate for the 'sgd' and 'minibatch-lbfgs' solvers, specified as a positive real scalar. The learning rate decays over iterations starting at the value specified for InitialLearningRate.

Use the NumTuningIterations and TuningSubsetSize name-value arguments to control the automatic tuning of initial learning rate in the call to fscnca.

Data Types: double

`Verbose` — Verbosity level indicator
Read-only: nonnegative integer

This property is read-only.

Verbosity level indicator, specified as a nonnegative integer. Possible values are:

0 — No convergence summary
1 — Convergence summary, including norm of gradient and objective function value
>1 — More convergence information, depending on the fitting algorithm. When you use the 'minibatch-lbfgs' solver and verbosity level > 1, the convergence information includes the iteration log from intermediate mini-batch LBFGS fits.

Data Types: double

`InitialFeatureWeights` — Initial feature weights
Read-only: p-by-1 vector of positive real scalars

This property is read-only.

Initial feature weights, specified as a p-by-1 vector of positive real scalars, where p is the number of predictors in X. For more information about feature weights, see Neighborhood Component Analysis (NCA) Feature Selection.

Data Types: double

`FeatureWeights` — Feature weights
Read-only: numeric vector | numeric matrix

This property is read-only.

Feature weights, specified as a p-by-1 numeric vector or a p-by-m numeric matrix, where p is the number of predictor variables after dummy variables are created for categorical variables (for more details, see ExpandedPredictorNames).

If FitMethod is 'average', then FeatureWeights is a p-by-m matrix. m is the number of partitions specified via the NumPartitions name-value argument in the call to fscnca.

The absolute value of FeatureWeights(k) is a measure of the importance of predictor k. A FeatureWeights(k) value that is close to 0 indicates that predictor k does not influence the response in Y.

For more information about feature weights, see Neighborhood Component Analysis (NCA) Feature Selection.

Data Types: double

`FitInfo` — Fit information
Read-only: structure

This property is read-only.

Fit information, specified as a structure with the following fields.

Field Name	Meaning
`Iteration`	Iteration index
`Objective`	Regularized objective function for minimization
`UnregularizedObjective`	Unregularized objective function for minimization
`Gradient`	Gradient of regularized objective function for minimization

For classification, UnregularizedObjective represents the negative of the leave-one-out accuracy of the NCA classifier on the training data.
For regression, UnregularizedObjective represents the leave-one-out loss between the true response and the predicted response when using the NCA regression model.
For the 'lbfgs' solver, Gradient is the final gradient. For the 'sgd' and 'minibatch-lbfgs' solvers, Gradient is the final mini-batch gradient.
If FitMethod is 'average', then FitInfo is an m-by-1 structure array, where m is the number of partitions specified via the NumPartitions name-value argument.

You can access the fields of FitInfo using dot notation. For example, for a FeatureSelectionNCAClassificationobject named mdl, you can access the Objective field using mdl.FitInfo.Objective.

Data Types: struct

Other Classification Properties

`NumObservations` — Number of observations in the training data
Read-only: scalar

This property is read-only.

Number of observations in the training data (X and Y) after removing NaN or Inf values, specified as a scalar.

Data Types: double

`Mu` — Predictor means
Read-only: p-by-1 vector | `[]`

This property is read-only.

Predictor means, specified as a p-by-1 vector for standardized training data. In this case, the predict method centers predictor matrix X by subtracting the respective element of Mu from every column.

If data is not standardized during training, then Mu is empty.

Data Types: double

`Sigma` — Predictor standard deviations
Read-only: p-by-1 vector | `[]`

This property is read-only.

Predictor standard deviations, specified as a p-by-1 vector for standardized training data. In this case, the predict method scales predictor matrix X by dividing every column by the respective element of Sigma after centering the data using Mu.

If data is not standardized during training, then Sigma is empty.

Data Types: double

`X` — Predictor values
Read-only: matrix | table

This property is read-only.

Predictor values used to train this model, specified as a matrix or a table. Each column of X represents one predictor (variable), and each row represents one observation.

Data Types: single | double | table

`Y` — Response values
Read-only: numeric vector of size n

This property is read-only.

Response values used to train this model, specified as a numeric vector of size n, where n is the number of observations.

Data Types: double

`W` — Observation weights
Read-only: numeric vector of size n

This property is read-only.

Observation weights used to train this model, specified as a numeric vector of size n. The sum of observation weights is n.

Data Types: double

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`ResponseName` — Response variable name
Read-only: character vector

This property is read-only.

Response variable name, specified as a character vector.

Data Types: char

`PredictorNames` — Predictor variable names
Read-only: cell array of unique character vectors

This property is read-only.

Predictor variable names in order of their appearance in the predictor data, specified as a cell array of unique character vectors. The length of PredictorNames is equal to the number of variables in the training data X used as predictor variables.

Data Types: cell

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of unique character vectors

This property is read-only.

Expanded predictor names, specified as a cell array of unique character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

`ClassNames` — Unique class labels
Read-only: cell array of unique character vectors

This property is read-only.

Unique class labels used in training, specified as a cell array of unique character vectors.

Data Types: cell

Object Functions

`loss`	Evaluate accuracy of learned feature weights on test data
`predict`	Predict responses using neighborhood component analysis (NCA) classifier
`refit`	Refit neighborhood component analysis (NCA) model for classification
`selectFeatures`	Select important features for NCA classification or regression

Examples

collapse all

Explore `FeatureSelectionNCAClassification` Object

Open Live Script

Load the sample data.

load ionosphere

The data set has 34 continuous predictors. The response variable is the radar returns, labeled as b (bad) or g (good).

Fit a neighborhood component analysis (NCA) model for classification to detect the relevant features.

mdl = fscnca(X,Y);

The returned NCA model, mdl, is a FeatureSelectionNCAClassification object. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.

Plot the feature weights.

plot(mdl.FeatureWeights,"o")
xlabel("Feature Index")
ylabel("Feature Weight")
grid on

Figure contains an axes object. The axes object with xlabel Feature Index, ylabel Feature Weight contains a line object which displays its values using only markers.

The weights of the irrelevant features are zero. The Verbose=1 option in the call to fscnca displays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.

plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,"o-")
grid on
xlabel("Iteration Number")
ylabel("Objective")

Figure contains an axes object. The axes object with xlabel Iteration Number, ylabel Objective contains an object of type line.

The ModelParameters property is a struct that contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.

mdl.ModelParameters.Standardize

ans = logical
   0

0 means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the Standardize=true name-value argument in the call to fscnca.

Version History

Introduced in R2016b

FeatureSelectionNCAClassification

Description

Creation

Properties

NCA Properties

ModelParameters — Model parameters Read-only: structure

Lambda — Regularization parameter Read-only: scalar

FitMethod — Name of fitting method Read-only: 'exact' | 'none' | 'average'

Solver — Name of the solver used to fit this model Read-only: 'lbfgs' | 'sgd' | 'minibatch-lbfgs'

GradientTolerance — Relative convergence tolerance on gradient norm Read-only: positive scalar

IterationLimit — Maximum number of iterations for optimization Read-only: positive integer

PassLimit — Maximum number of passes Read-only: positive integer

InitialLearningRate — Initial learning rate Read-only: positive real scalar

Verbose — Verbosity level indicator Read-only: nonnegative integer

InitialFeatureWeights — Initial feature weights Read-only: p-by-1 vector of positive real scalars

FeatureWeights — Feature weights Read-only: numeric vector | numeric matrix

FitInfo — Fit information Read-only: structure

Other Classification Properties

NumObservations — Number of observations in the training data Read-only: scalar

Mu — Predictor means Read-only: p-by-1 vector | []

Sigma — Predictor standard deviations Read-only: p-by-1 vector | []

X — Predictor values Read-only: matrix | table

Y — Response values Read-only: numeric vector of size n

W — Observation weights Read-only: numeric vector of size n

CategoricalPredictors — Categorical predictor indices Read-only: vector of positive integers | []

ResponseName — Response variable name Read-only: character vector

PredictorNames — Predictor variable names Read-only: cell array of unique character vectors

ExpandedPredictorNames — Expanded predictor names Read-only: cell array of unique character vectors

ClassNames — Unique class labels Read-only: cell array of unique character vectors

Object Functions

Examples

Explore FeatureSelectionNCAClassification Object

Version History

See Also

`ModelParameters` — Model parameters
Read-only: structure

`Lambda` — Regularization parameter
Read-only: scalar

`FitMethod` — Name of fitting method
Read-only: `'exact'` | `'none'` | `'average'`

`Solver` — Name of the solver used to fit this model
Read-only: `'lbfgs'` | `'sgd'` | `'minibatch-lbfgs'`

`GradientTolerance` — Relative convergence tolerance on gradient norm
Read-only: positive scalar

`IterationLimit` — Maximum number of iterations for optimization
Read-only: positive integer

`PassLimit` — Maximum number of passes
Read-only: positive integer

`InitialLearningRate` — Initial learning rate
Read-only: positive real scalar

`Verbose` — Verbosity level indicator
Read-only: nonnegative integer

`InitialFeatureWeights` — Initial feature weights
Read-only: p-by-1 vector of positive real scalars

`FeatureWeights` — Feature weights
Read-only: numeric vector | numeric matrix

`FitInfo` — Fit information
Read-only: structure

`NumObservations` — Number of observations in the training data
Read-only: scalar

`Mu` — Predictor means
Read-only: p-by-1 vector | `[]`

`Sigma` — Predictor standard deviations
Read-only: p-by-1 vector | `[]`

`X` — Predictor values
Read-only: matrix | table

`Y` — Response values
Read-only: numeric vector of size n

`W` — Observation weights
Read-only: numeric vector of size n

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

`ResponseName` — Response variable name
Read-only: character vector

`PredictorNames` — Predictor variable names
Read-only: cell array of unique character vectors

`ExpandedPredictorNames` — Expanded predictor names
Read-only: cell array of unique character vectors

`ClassNames` — Unique class labels
Read-only: cell array of unique character vectors

Explore `FeatureSelectionNCAClassification` Object