FeatureSelectionNCAClassification
Feature selection for classification using neighborhood component analysis (NCA)
Description
FeatureSelectionNCAClassification
object contains the data,
fitting information, feature weights, and other parameters of a neighborhood component
analysis (NCA) model. fscnca
learns the feature weights using a
diagonal adaptation of NCA and returns an instance of a
FeatureSelectionNCAClassification
object. The function achieves
feature selection by regularizing the feature weights.
Creation
Create a FeatureSelectionNCAClassification
object using fscnca
.
Properties
NCA Properties
ModelParameters
— Model parameters
structure
This property is read-only.
Model parameters used for training the model, specified as a structure.
You can access the fields of ModelParameters
using dot
notation.
For example, for a FeatureSelectionNCAClassification object named mdl
, you can access the
LossFunction
value using
mdl.ModelParameters.LossFunction
.
Data Types: struct
Lambda
— Regularization parameter
scalar
This property is read-only.
Regularization parameter used for training this model, specified as a scalar. For
n observations, the best Lambda
value that
minimizes the generalization error of the NCA model is expected to be a multiple of
1/n.
Data Types: double
FitMethod
— Name of fitting method
'exact'
| 'none'
| 'average'
This property is read-only.
Name of the fitting method used to fit this model, specified as one of the following:
'exact'
— Perform fitting using all of the data.'none'
— No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call tofscnca
.'average'
— Divide the data into partitions (subsets), fit each partition using theexact
method, and return the average of the feature weights. You can specify the number of partitions using theNumPartitions
name-value argument.
Solver
— Name of the solver used to fit this model
'lbfgs'
| 'sgd'
| 'minibatch-lbfgs'
This property is read-only.
Name of the solver used to fit this model, specified as one of the following:
'lbfgs'
— Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm'sgd'
— Stochastic gradient descent (SGD) algorithm'minibatch-lbfgs'
— stochastic gradient descent with LBFGS algorithm applied to mini-batches
GradientTolerance
— Relative convergence tolerance on gradient norm
positive scalar
This property is read-only.
Relative convergence tolerance on the gradient norm for the 'lbfgs'
and 'minibatch-lbfgs'
solvers, specified as a positive scalar
value.
Data Types: double
IterationLimit
— Maximum number of iterations for optimization
positive integer
This property is read-only.
Maximum number of iterations for optimization, specified as a positive integer value.
Data Types: double
PassLimit
— Maximum number of passes
positive integer
This property is read-only.
Maximum number of passes for 'sgd'
and
'minibatch-lbfgs'
solvers, specified as a positive integer.
Every pass processes all
of the observations in the data.
Data Types: double
InitialLearningRate
— Initial learning rate
positive real scalar
This property is read-only.
Initial learning rate for the 'sgd'
and
'minibatch-lbfgs'
solvers, specified as a
positive real scalar.
The
learning rate decays over iterations starting at the value specified for
InitialLearningRate
.
Use the NumTuningIterations
and
TuningSubsetSize
name-value arguments to
control the automatic tuning of initial learning rate in the call to
fscnca
.
Data Types: double
Verbose
— Verbosity level indicator
nonnegative integer
This property is read-only.
Verbosity level indicator, specified as a nonnegative integer. Possible values are:
0 — No convergence summary
1 — Convergence summary, including norm of gradient and objective function value
>1 — More convergence information, depending on the fitting algorithm. When you use the
'minibatch-lbfgs'
solver and verbosity level > 1, the convergence information includes the iteration log from intermediate mini-batch LBFGS fits.
Data Types: double
InitialFeatureWeights
— Initial feature weights
p-by-1 vector of positive real scalars
This property is read-only.
Initial feature weights, specified as a p-by-1 vector of positive
real scalars, where p is the number of predictors in
X
. For more information about feature weights, see Neighborhood Component Analysis (NCA) Feature Selection.
Data Types: double
FeatureWeights
— Feature weights
numeric vector | numeric matrix
This property is read-only.
Feature weights, specified as a p-by-1 numeric
vector or a p-by-m numeric matrix,
where p is the number of predictor variables after
dummy variables are created for categorical variables (for more details,
see ExpandedPredictorNames
).
If FitMethod
is 'average'
, then
FeatureWeights
is a
p-by-m matrix.
m is the number of partitions specified via the
NumPartitions
name-value argument in the call to
fscnca
.
The absolute value of FeatureWeights(k)
is a
measure of the importance of predictor k
. A
FeatureWeights(k)
value that is close to 0
indicates that predictor k
does not influence the
response in Y
.
For more information about feature weights, see Neighborhood Component Analysis (NCA) Feature Selection.
Data Types: double
FitInfo
— Fit information
structure
This property is read-only.
Fit information, specified as a structure with the following fields.
Field Name | Meaning |
---|---|
Iteration | Iteration index |
Objective | Regularized objective function for minimization |
UnregularizedObjective | Unregularized objective function for minimization |
Gradient | Gradient of regularized objective function for minimization |
For classification,
UnregularizedObjective
represents the negative of the leave-one-out accuracy of the NCA classifier on the training data.For regression,
UnregularizedObjective
represents the leave-one-out loss between the true response and the predicted response when using the NCA regression model.For the
'lbfgs'
solver,Gradient
is the final gradient. For the'sgd'
and'minibatch-lbfgs'
solvers,Gradient
is the final mini-batch gradient.If
FitMethod
is'average'
, thenFitInfo
is an m-by-1 structure array, where m is the number of partitions specified via theNumPartitions
name-value argument.
You can access the fields of FitInfo
using dot notation. For
example, for a FeatureSelectionNCAClassificationobject named mdl
, you can access the
Objective
field using
mdl.FitInfo.Objective
.
Data Types: struct
Other Classification Properties
NumObservations
— Number of observations in the training data
scalar
This property is read-only.
Number of observations in the training data (X
and
Y
) after removing NaN
or
Inf
values, specified as a scalar.
Data Types: double
Mu
— Predictor means
p-by-1 vector | []
This property is read-only.
Predictor means, specified as a p-by-1 vector for standardized
training data. In this case, the predict
method centers predictor
matrix X
by subtracting the respective element of
Mu
from every column.
If data is not standardized during training, then Mu
is
empty.
Data Types: double
Sigma
— Predictor standard deviations
p-by-1 vector | []
This property is read-only.
Predictor standard deviations, specified as a p-by-1 vector for
standardized training data. In this case, the predict
method scales
predictor matrix X
by dividing every column by the respective
element of Sigma
after centering the data using
Mu
.
If data is not standardized during training, then Sigma
is
empty.
Data Types: double
X
— Predictor values
matrix | table
This property is read-only.
Predictor values used to train this model, specified as a matrix or a table. Each
column of X
represents one predictor (variable), and each row
represents one observation.
Data Types: single
| double
| table
Y
— Response values
numeric vector of size n
This property is read-only.
Response values used to train this model, specified as a numeric vector of size n, where n is the number of observations.
Data Types: double
W
— Observation weights
numeric vector of size n
This property is read-only.
Observation weights used to train this model, specified as a numeric vector of size n. The sum of observation weights is n.
Data Types: double
CategoricalPredictors
— Categorical predictor indices
vector of positive integers | []
This property is read-only.
Categorical predictor indices, specified as a vector of positive integers.
CategoricalPredictors
contains index values indicating that the
corresponding predictors are categorical. The index values are between 1 and
p, where p is the number of predictors used to
train the model. If none of the predictors are categorical, then this property is empty
([]
).
Data Types: single
| double
ResponseName
— Response variable name
character vector
This property is read-only.
Response variable name, specified as a character vector.
Data Types: char
PredictorNames
— Predictor variable names
cell array of unique character vectors
This property is read-only.
Predictor variable names in order of their appearance in the predictor data,
specified as a cell array of unique character vectors. The length of
PredictorNames
is equal to the number of
variables in the training data X
used as predictor
variables.
Data Types: cell
ExpandedPredictorNames
— Expanded predictor names
cell array of unique character vectors
This property is read-only.
Expanded predictor names, specified as a cell array of unique character vectors.
If the model uses encoding for categorical variables, then
ExpandedPredictorNames
includes the names that describe the
expanded variables. Otherwise, ExpandedPredictorNames
is the same as
PredictorNames
.
Data Types: cell
ClassNames
— Unique class labels
cell array of unique character vectors
This property is read-only.
Unique class labels used in training, specified as a cell array of unique character vectors.
Data Types: cell
Object Functions
loss | Evaluate accuracy of learned feature weights on test data |
predict | Predict responses using neighborhood component analysis (NCA) classifier |
refit | Refit neighborhood component analysis (NCA) model for classification |
selectFeatures | Select important features for NCA classification or regression |
Examples
Explore FeatureSelectionNCAClassification
Object
Load the sample data.
load ionosphere
The data set has 34 continuous predictors. The response variable is the radar returns, labeled as b (bad) or g (good).
Fit a neighborhood component analysis (NCA) model for classification to detect the relevant features.
mdl = fscnca(X,Y);
The returned NCA model, mdl
, is a FeatureSelectionNCAClassification
object. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.
Plot the feature weights.
plot(mdl.FeatureWeights,"o") xlabel("Feature Index") ylabel("Feature Weight") grid on
The weights of the irrelevant features are zero. The Verbose=1
option in the call to fscnca
displays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.
plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,"o-") grid on xlabel("Iteration Number") ylabel("Objective")
The ModelParameters
property is a struct
that contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.
mdl.ModelParameters.Standardize
ans = logical
0
0
means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the Standardize=true
name-value argument in the call to fscnca
.
Version History
Introduced in R2016b
See Also
predict
| fscnca
| refit
| loss
| selectFeatures
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)