lime
Local interpretable model-agnostic explanations (LIME)
Description
LIME explains a prediction of a machine learning model (classification or regression) for a query point by finding important predictors and fitting a simple interpretable model.
You can create a lime object for a machine learning model with a
specified query point (queryPoint) and a specified number of important
predictors (numImportantPredictors). The software generates a synthetic
data set, and fits a simple interpretable model of important predictors that effectively
explains the predictions for the synthetic data around the query point. The simple model can
be a linear model (default) or decision tree model.
Use the fitted simple model to explain a prediction of the machine learning model locally,
at the specified query point. Use the plot function to
visualize the LIME results. Based on the local explanations, you can decide whether or not to
trust the machine learning model.
Fit a new simple model for another query point by using the fit
function.
Creation
Syntax
Description
creates the results = lime(blackbox)lime object results using the machine
learning model object blackbox, which contains predictor data. The
lime function generates samples of a synthetic predictor data set
and computes the predictions for the samples. To fit a simple model, use the fit function
with results.
creates a results = lime(blackbox,'CustomSyntheticData',customSyntheticData)lime object using the pregenerated, custom synthetic
predictor data set customSyntheticData. The
lime function computes the predictions for the samples in
customSyntheticData.
also finds the specified number of important predictors and fits a linear simple model
for the query point results = lime(___,'QueryPoint',queryPoint,'NumImportantPredictors',numImportantPredictors)queryPoint. You can specify
queryPoint and numImportantPredictors in
addition to any of the input argument combinations in the previous syntaxes.
specifies additional options using one or more name-value arguments. For example,
results = lime(___,Name,Value)'SimpleModelType','tree' specifies the type of simple model as a
decision tree model.
Input Arguments
Machine learning model to be interpreted, specified as a full or compact regression or classification model object or a function handle.
Full or compact model object — You can specify a full or compact regression or classification model object, which has a
predictobject function. The software uses thepredictfunction to compute the predictions for the query point and the synthetic predictor data set.If you specify a model object that does not contain predictor data (for example, a compact model), then you must provide predictor data using
XorcustomSyntheticData.limedoes not support a model object trained with a sparse matrix. When you train a model, use a full numeric matrix or table for the predictor data where rows correspond to individual observations.limedoes not support a model object trained with more than one response variable.
Regression Model Object
Supported Model Full or Compact Regression Model Object Ensemble of regression models RegressionEnsemble,RegressionBaggedEnsemble,CompactRegressionEnsembleGaussian kernel regression model using random feature expansion RegressionKernelGaussian process regression RegressionGP,CompactRegressionGPGeneralized additive model RegressionGAM,CompactRegressionGAMLinear regression for high-dimensional data RegressionLinearNeural network regression model RegressionNeuralNetwork,CompactRegressionNeuralNetworkRegression tree RegressionTree,CompactRegressionTreeSupport vector machine regression RegressionSVM,CompactRegressionSVMClassification Model Object
Supported Model Full or Compact Classification Model Object Binary decision tree for multiclass classification ClassificationTree,CompactClassificationTreeDiscriminant analysis classifier ClassificationDiscriminant,CompactClassificationDiscriminantEnsemble of learners for classification ClassificationEnsemble,CompactClassificationEnsemble,ClassificationBaggedEnsembleGaussian kernel classification model using random feature expansion ClassificationKernelGeneralized additive model ClassificationGAM,CompactClassificationGAMk-nearest neighbor model ClassificationKNNLinear classification model ClassificationLinearMulticlass model for support vector machines or other classifiers ClassificationECOC,CompactClassificationECOCNaive Bayes model ClassificationNaiveBayes,CompactClassificationNaiveBayesNeural network classifier ClassificationNeuralNetwork,CompactClassificationNeuralNetworkSupport vector machine for binary classification ClassificationSVM,CompactClassificationSVMFunction handle — You can specify a function handle that accepts predictor data and returns a column vector containing a prediction for each observation in the predictor data. The prediction is a predicted response for regression or a classified label for classification. You must provide the predictor data using
XorcustomSyntheticDataand specify the'Type'name-value argument.
Predictor data, specified as a numeric matrix or table. Each row of
X corresponds to one observation, and each column corresponds
to one variable.
X must be consistent with the predictor data that trained
blackbox,
stored in blackbox.X. The specified value must not contain a
response variable.
Xmust have the same data types as the predictor variables (for example,trainX) that trainedblackbox. The variables that make up the columns ofXmust have the same number and order as intrainX.If you train
blackboxusing a numeric matrix, thenXmust be a numeric matrix.If you train
blackboxusing a table, thenXmust be a table. All predictor variables inXmust have the same variable names and data types as intrainX.
limedoes not support a sparse matrix.
If blackbox is a model object that does not contain predictor
data or a function handle, you must provide X or customSyntheticData. If blackbox is a full machine
learning model object and you specify this argument, then lime
does not use the predictor data in blackbox. It uses the
specified predictor data only.
Data Types: single | double | table
Pregenerated, custom synthetic predictor data set, specified as a numeric matrix or table.
If you provide a pregenerated data set, then lime uses the
provided data set instead of generating a new synthetic predictor data set.
customSyntheticData must be consistent with the predictor
data that trained blackbox,
stored in blackbox.X. The specified value must not contain a
response variable.
customSyntheticDatamust have the same data types as the predictor variables (for example,trainX) that trainedblackbox. The variables that make up the columns ofcustomSyntheticDatamust have the same number and order as intrainXIf you train
blackboxusing a numeric matrix, thencustomSyntheticDatamust be a numeric matrix.If you train
blackboxusing a table, thencustomSyntheticDatamust be a table. All predictor variables incustomSyntheticDatamust have the same variable names and data types as intrainX.
limedoes not support a sparse matrix.
If blackbox is a model object that does not contain predictor
data or a function handle, you must provide X or
customSyntheticData. If blackbox is a full
machine learning model object and you specify this argument, then
lime does not use the predictor data in
blackbox; it uses the specified predictor data only.
Data Types: single | double | table
Query point at which lime explains a prediction, specified as
a row vector of numeric values or a single-row table. queryPoint
must have the same data type and number of columns as X,
customSyntheticData, or the predictor data in blackbox.
If you specify numImportantPredictors and queryPoint, then the
lime function fits a simple model when creating a
lime object.
queryPoint must not contain missing values.
Example: blackbox.X(1,:) specifies the query point as the first
observation of the predictor data in the full machine learning model
blackbox.
Data Types: single | double | table
Number of important predictors to use in the simple model, specified as a positive integer scalar value.
If
'SimpleModelType'is'linear'(default), then the software selects the specified number of important predictors and fits a linear model of the selected predictors. Note that the software does not use unimportant predictors when fitting the linear model.If
'SimpleModelType'is'tree', then the software specifies the maximum number of decision splits (or branch nodes) as the number of important predictors so that the fitted decision tree uses at most the specified number of predictors.
If you specify numImportantPredictors and queryPoint,
then the lime function fits a simple model when creating a
lime object.
Data Types: single | double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
Example:
specifies the query point as lime(blackbox,'QueryPoint',q,'NumImportantPredictors',n,'SimpleModelType','tree')q, the number of important predictors to
use for the simple model as n, and the type of simple model as a
decision tree model. lime generates samples of a synthetic predictor
data set, computes the predictions for the samples, and fits a decision tree model for the
query point using at most the specified number of predictors.
Options for Synthetic Predictor Data
Locality of the synthetic data for data generation, specified as the
comma-separated pair consisting of 'DataLocality' and
'global' or 'local'.
'global'— The software estimates distribution parameters using the whole predictor data set (Xor the predictor data inblackbox). The software generates a synthetic predictor data set with the estimated parameters and uses the data set for simple model fitting of any query point.'local'— The software estimates the distribution parameters using the k-nearest neighbors of a query point, where k is the'NumNeighbors'value. The software generates a new synthetic predictor data set each time it fits a simple model for the specified query point.
For more details, see LIME.
Example: 'DataLocality','local'
Data Types: char | string
Number of neighbors of the query point, specified as the comma-separated pair
consisting of 'NumNeighbors' and a positive integer scalar
value. This argument is valid only when 'DataLocality' is 'local'.
If you specify a value larger than the number of observations in the predictor
data set (X or the
predictor data in blackbox), then lime uses all observations.
Example: 'NumNeighbors',2000
Data Types: single | double
Number of samples to generate for the synthetic data set, specified as the
comma-separated pair consisting of 'NumSyntheticData' and a
positive integer scalar value.
Example: 'NumSyntheticData',2500
Data Types: single | double
Options for Simple Model
Since R2023b
Relative tolerance on the linear coefficients and the bias term (intercept) for the linear simple model, specified as a nonnegative scalar.
This argument is valid only when the SimpleModelType value is
"linear".
Let , that is, the vector of the coefficients and the bias term at fitting step t. If , then the fitting process for the linear simple model terminates.
Example: "BetaTolerance",1e-8
Data Types: single | double
Kernel width of the squared exponential (or Gaussian) kernel function, specified as the comma-separated pair consisting of 'KernelWidth' and a numeric scalar value.
The lime function computes distances between the query point and
the samples in the synthetic predictor data set, and then converts the distances to weights
by using the squared exponential kernel function. If you lower the
'KernelWidth' value, then lime uses
weights that are more focused on the samples near the query point. For details, see LIME.
Example: 'KernelWidth',0.5
Data Types: single | double
Type of the simple model, specified as the comma-separated pair consisting of 'SimpleModelType' and 'linear' or 'tree'.
'linear'— The software fits a linear model by usingfitrlinearfor regression orfitclinearfor classification.'tree'— The software fits a decision tree model by usingfitrtreefor regression orfitctreefor classification.
Example: 'SimpleModelType','tree'
Data Types: char | string
Options for Machine Learning Model
Categorical predictors list, specified as the comma-separated pair consisting of
'CategoricalPredictors' and one of the values in this
table.
| Value | Description |
|---|---|
| Vector of positive integers | Each entry in the vector is an index value indicating that the corresponding predictor
is categorical. The index values are between 1 and If |
| Logical vector | A |
| Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the variable names of the predictor data in the form of a table. Pad the names with extra blanks so each row of the character matrix has the same length. |
| String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the variable names of the predictor data in the form of a table. |
"all" | All predictors are categorical. |
If you specify
blackboxas a function handle, thenlimeidentifies categorical predictors from the predictor dataXorcustomSyntheticData. If the predictor data is in a table,limeassumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix,limeassumes that all predictors are continuous.If you specify
blackboxas a regression or classification model object, thenlimeidentifies categorical predictors by using theCategoricalPredictorsproperty of the model object.
lime does not support an ordered categorical
predictor.
Example: 'CategoricalPredictors','all'
Data Types: single | double | logical | char | string | cell
Type of the machine learning model, specified as the comma-separated pair
consisting of 'Type' and 'regression or
'classification'.
You must specify this argument when you specify blackbox
as a function handle. If you specify blackbox as a regression
or classification model object, then lime determines the
'Type' value depending on the model type.
Example: 'Type','classification'
Data Types: char | string
Options for Computing Distances
Distance metric, specified as the comma-separated pair consisting of 'Distance' and a character vector, string scalar, or function handle.
If the predictor data includes only continuous variables, then
limesupports these distance metrics.Value Description 'euclidean'Euclidean distance.
'seuclidean'Standardized Euclidean distance. Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation,
S = std(PD,'omitnan'), wherePDis the predictor data or synthetic predictor data. To specify different scaling, use the'Scale'name-value argument.'mahalanobis'Mahalanobis distance using the sample covariance of
PD,C = cov(PD,'omitrows'). To change the value of the covariance matrix, use the'Cov'name-value argument.'cityblock'City block distance.
'minkowski'Minkowski distance. The default exponent is 2. To specify a different exponent, use the
'P'name-value argument.'chebychev'Chebychev distance (maximum coordinate difference).
'cosine'One minus the cosine of the included angle between points (treated as vectors).
'correlation'One minus the sample correlation between points (treated as sequences of values).
'spearman'One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
@distfunCustom distance function handle. A distance function has the form
wherefunction D2 = distfun(ZI,ZJ) % calculation of distance ...
ZIis a1-by-tvector containing a single observation.ZJis ans-by-tmatrix containing multiple observations.distfunmust accept a matrixZJwith an arbitrary number of observations.D2is ans-by-1vector of distances, andD2(k)is the distance between observationsZIandZJ(k,:).
If your data is not sparse, you can generally compute distance more quickly by using a built-in distance metric instead of a function handle.
If the predictor data includes both continuous and categorical variables, then
limesupports these distance metrics.Value Description 'goodall3'Modified Goodall distance
'ofd'Occurrence frequency distance
For definitions, see Distance Metrics.
The default value is 'euclidean' if the predictor data
includes only continuous variables, or 'goodall3' if the
predictor data includes both continuous and categorical variables.
Example: 'Distance','ofd'
Data Types: char | string | function_handle
Covariance matrix for the Mahalanobis distance metric, specified as the
comma-separated pair consisting of 'Cov' and a
K-by-K positive definite matrix, where
K is the number of predictors.
This argument is valid only if 'Distance' is 'mahalanobis'.
The default 'Cov' value is
cov(PD,'omitrows'), where PD is the
predictor data or synthetic predictor data. If you do not specify the
'Cov' value, then the software uses different covariance
matrices when computing the distances for both the predictor data and the synthetic
predictor data.
Example: 'Cov',eye(3)
Data Types: single | double
Exponent for the Minkowski distance metric, specified as the comma-separated
pair consisting of 'P' and a positive scalar.
This argument is valid only if 'Distance' is 'minkowski'.
Example: 'P',3
Data Types: single | double
Scale parameter value for the standardized Euclidean distance metric, specified
as the comma-separated pair consisting of 'Scale' and a
nonnegative numeric vector of length K, where
K is the number of predictors.
This argument is valid only if 'Distance' is 'seuclidean'.
The default 'Scale' value is
std(PD,'omitnan'), where PD is the predictor
data or synthetic predictor data. If you do not specify the
'Scale' value, then the software uses different scale
parameters when computing the distances for both the predictor data and the
synthetic predictor data.
Example: 'Scale',quantile(X,0.75) -
quantile(X,0.25)
Data Types: single | double
Properties
Specified Properties
You can specify the following properties when creating a lime
object.
This property is read-only.
Machine learning model to be interpreted, specified as a regression or classification model object or a function handle.
The blackbox
argument sets this property.
This property is read-only.
Categorical predictor
indices, specified as a vector of positive integers. CategoricalPredictors
contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and p, where p is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty ([]).
If you specify
blackboxusing a function handle, thenlimeidentifies categorical predictors from the predictor dataXorcustomSyntheticData. If you specify the'CategoricalPredictors'name-value argument, then the argument sets this property.If you specify
blackboxas a regression or classification model object, thenlimedetermines this property by using theCategoricalPredictorsproperty of the model object.
lime does not support an ordered categorical
predictor.
If 'SimpleModelType' is 'linear' (default), then
lime creates dummy variables for each identified
categorical predictor. lime treats the category of the
specified query point as a reference group and creates one less dummy variable than
the number of categories. For more details, see Dummy Variables with Reference Group.
Data Types: single | double
This property is read-only.
Locality of the synthetic data for data generation, specified as
'global' or 'local'.
The 'DataLocality' name-value argument sets this property.
This property is read-only.
Number of important predictors to use in the simple model (SimpleModel), specified as a positive integer scalar value. This value
might be greater than the number of predictors actually used to train the simple
model. For more information, see LIME.
The numImportantPredictors argument of lime or the
numImportantPredictors argument of fit sets this
property.
Data Types: single | double
This property is read-only.
Number of samples in the synthetic data set, specified as a positive integer scalar value.
If you specify
customSyntheticData, then the number of samples in the custom synthetic data set sets this property.Otherwise, the
'NumSyntheticData'name-value argument oflimeor the'NumSyntheticData'name-value argument offitsets this property.
Data Types: single | double
This property is read-only.
Query point at which lime explains a prediction using the
simple model (SimpleModel), specified as a row vector of numeric values or single-row
table.
The queryPoint
argument of lime or the queryPoint
argument of fit sets this property.
Data Types: single | double | table
This property is read-only.
Type of the machine learning model (BlackboxModel), specified as 'regression or
'classification'.
This property is read-only.
Predictor data, specified as a numeric matrix or table.
Each row of X corresponds to one observation, and each column
corresponds to one variable.
If you specify the
Xargument, then the argument sets this property.If you specify the
customSyntheticDataargument, then this property is empty.If you specify
blackboxas a full machine learning model object and do not specifyXorcustomSyntheticData, then this property value is the predictor data used to trainblackbox.
lime does not use rows that contain missing values and does not
store the rows in X.
Data Types: single | double | table
Computed Properties
The software computes the following properties.
This property is read-only.
Prediction for the query point computed by the machine learning model (BlackboxModel), specified as a scalar. The prediction is a predicted
response for regression or a classified label for classification.
Data Types: single | double | categorical | logical | char | string | cell
This property is read-only.
Predictions for synthetic predictor data computed by the machine learning model
(BlackboxModel), specified as a vector.
Data Types: single | double | categorical | logical | char | string | cell
This property is read-only.
Important predictor indices, specified as a vector of positive integers.
ImportantPredictors contains the index values corresponding to
the columns of the predictors used in the simple model (SimpleModel).
Data Types: single | double
This property is read-only.
Simple model, specified as a RegressionLinear, RegressionTree, ClassificationLinear, or ClassificationTree model object. lime determines the
type of simple model object depending on the type of the machine learning model
(Type) and
the type of the simple model ('SimpleModelType').
This property is read-only.
Prediction for the query point computed by the simple model (SimpleModel), specified as a scalar.
If SimpleModel
is ClassificationLinear, then the
SimpleModelFitted value is 1 or –1.
The
SimpleModelFittedvalue is 1 if the prediction from the simple model is the same asBlackboxFitted(prediction from the machine learning model).The
SimpleModelFittedvalue is –1 if the prediction from the simple model is different fromBlackboxFitted. If theBlackboxFittedvalue isA, then theplotfunction displays theSimpleModelFittedvalue asNot A.
Data Types: single | double | categorical | logical | char | string | cell
This property is read-only.
Synthetic predictor data, specified as a numeric matrix or a table.
If you specify the
customSyntheticDatainput argument, then the argument sets this property.Otherwise,
limeestimates distribution parameters from the predictor dataXand generates a synthetic predictor data set.
Data Types: single | double | table
Object Functions
Examples
Train a classification model and create a lime object that uses a decision tree simple model. When you create a lime object, specify a query point and the number of important predictors so that the software generates samples of a synthetic data set and fits a simple model for the query point with important predictors. Then display the estimated predictor importance in the simple model by using the object function plot.
Load the CreditRating_Historical data set. The data set contains customer IDs and their financial ratios, industry labels, and credit ratings.
tbl = readtable('CreditRating_Historical.dat');Display the first three rows of the table.
head(tbl,3)
ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ _____ _____ _______ ________ _____ ________ ______
62394 0.013 0.104 0.036 0.447 0.142 3 {'BB'}
48608 0.232 0.335 0.062 1.969 0.281 8 {'A' }
42444 0.311 0.367 0.074 1.935 0.366 1 {'A' }
Create a table of predictor variables by removing the columns of customer IDs and ratings from tbl.
tblX = removevars(tbl,["ID","Rating"]);
Train a blackbox model of credit ratings by using the fitcecoc function.
blackbox = fitcecoc(tblX,tbl.Rating,'CategoricalPredictors','Industry');
Create a lime object that explains the prediction for the last observation using a decision tree simple model. Specify 'NumImportantPredictors' as six to find at most 6 important predictors. If you specify the 'QueryPoint' and 'NumImportantPredictors' values when you create a lime object, then the software generates samples of a synthetic data set and fits a simple interpretable model to the synthetic data set.
queryPoint = tblX(end,:)
queryPoint=1×6 table
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry
_____ _____ _______ ________ ____ ________
0.239 0.463 0.065 2.924 0.34 2
rng('default') % For reproducibility results = lime(blackbox,'QueryPoint',queryPoint,'NumImportantPredictors',6, ... 'SimpleModelType','tree')
results =
lime with properties:
BlackboxModel: [1×1 ClassificationECOC]
DataLocality: 'global'
CategoricalPredictors: 6
Type: 'classification'
X: [3932×6 table]
QueryPoint: [1×6 table]
NumImportantPredictors: 6
NumSyntheticData: 5000
SyntheticData: [5000×6 table]
Fitted: {5000×1 cell}
SimpleModel: [1×1 ClassificationTree]
ImportantPredictors: [2×1 double]
BlackboxFitted: {'AA'}
SimpleModelFitted: {'AA'}
Plot the lime object results by using the object function plot.
f = plot(results);

The plot displays two predictions for the query point, which correspond to the BlackboxFitted property and the SimpleModelFitted property of results.
The horizontal bar graph shows the sorted predictor importance values. lime finds the financial ratio variables MVE_BVTD and RE_TA as important predictors for the query point.
You can read the bar lengths by using data tips or Bar Properties. For example, you can find Bar objects by using the findobj function and add labels to the ends of the bars by using the text function.
b = findobj(f,'Type','bar'); text(b.YEndPoints+0.001,b.XEndPoints,string(b.YData))

Alternatively, you can display the coefficient values in a table with the predictor variable names.
imp = b.YData; flipud(array2table(imp', ... 'RowNames',f.CurrentAxes.YTickLabel,'VariableNames',{'Predictor Importance'}))
ans=2×1 table
Predictor Importance
____________________
MVE_BVTD 0.088412
RE_TA 0.0018061
Train a regression model and create a lime object that uses a linear simple model. When you create a lime object, if you do not specify a query point and the number of important predictors, then the software generates samples of a synthetic data set but does not fit a simple model. Use the object function fit to fit a simple model for a query point. Then display the coefficients of the fitted linear simple model by using the object function plot.
Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbigCreate a table containing the predictor variables Acceleration, Cylinders, and so on, as well as the response variable MPG.
tbl = table(Acceleration,Cylinders,Displacement,Horsepower,Model_Year,Weight,MPG);
Removing missing values in a training set can help reduce memory consumption and speed up training for the fitrkernel function. Remove missing values in tbl.
tbl = rmmissing(tbl);
Create a table of predictor variables by removing the response variable from tbl.
tblX = removevars(tbl,'MPG');Train a blackbox model of MPG by using the fitrkernel function.
rng('default') % For reproducibility mdl = fitrkernel(tblX,tbl.MPG,'CategoricalPredictors',[2 5]);
Create a lime object. Specify a predictor data set because mdl does not contain predictor data.
results = lime(mdl,tblX)
results =
lime with properties:
BlackboxModel: [1×1 RegressionKernel]
DataLocality: 'global'
CategoricalPredictors: [2 5]
Type: 'regression'
X: [392×6 table]
QueryPoint: []
NumImportantPredictors: []
NumSyntheticData: 5000
SyntheticData: [5000×6 table]
Fitted: [5000×1 double]
SimpleModel: []
ImportantPredictors: []
BlackboxFitted: []
SimpleModelFitted: []
results contains the generated synthetic data set. The SimpleModel property is empty ([]).
Fit a linear simple model for the first observation in tblX. Specify the number of important predictors to find as 3.
queryPoint = tblX(1,:)
queryPoint=1×6 table
Acceleration Cylinders Displacement Horsepower Model_Year Weight
____________ _________ ____________ __________ __________ ______
12 8 307 130 70 3504
results = fit(results,queryPoint,3);
Plot the lime object results by using the object function plot.
plot(results)

The plot displays two predictions for the query point, which correspond to the BlackboxFitted property and the SimpleModelFitted property of results.
The horizontal bar graph shows the coefficient values of the simple model, sorted by their absolute values. LIME finds Horsepower, Model_Year, and Cylinders as important predictors for the query point.
Model_Year and Cylinders are categorical predictors that have multiple categories. For a linear simple model, the software creates one less dummy variable than the number of categories for each categorical predictor. The bar graph displays only the most important dummy variable. You can check the coefficients of the other dummy variables using the SimpleModel property of results. Display the sorted coefficient values, including all categorical dummy variables.
[~,I] = sort(abs(results.SimpleModel.Beta),'descend'); table(results.SimpleModel.ExpandedPredictorNames(I)',results.SimpleModel.Beta(I), ... 'VariableNames',{'Expanded Predictor Name','Coefficient'})
ans=17×2 table
Expanded Predictor Name Coefficient
__________________________ ___________
{'Horsepower' } -3.5035e-05
{'Model_Year (74 vs. 70)'} -6.1591e-07
{'Model_Year (80 vs. 70)'} -3.9803e-07
{'Model_Year (81 vs. 70)'} 3.4186e-07
{'Model_Year (82 vs. 70)'} -2.2331e-07
{'Cylinders (6 vs. 8)' } -1.9807e-07
{'Model_Year (76 vs. 70)'} 1.816e-07
{'Cylinders (5 vs. 8)' } 1.7318e-07
{'Model_Year (71 vs. 70)'} 1.5694e-07
{'Model_Year (75 vs. 70)'} 1.5486e-07
{'Model_Year (77 vs. 70)'} 1.5151e-07
{'Model_Year (78 vs. 70)'} 1.3864e-07
{'Model_Year (72 vs. 70)'} 6.8949e-08
{'Cylinders (4 vs. 8)' } 6.3098e-08
{'Model_Year (73 vs. 70)'} 4.9696e-08
{'Model_Year (79 vs. 70)'} -2.4822e-08
⋮
Train a regression model and create a lime object using a function handle to the predict function of the model. Use the object function fit to fit a simple model for the specified query point. Then display the coefficients of the fitted linear simple model by using the object function plot.
Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbigCreate a table containing the predictor variables Acceleration, Cylinders, and so on.
tbl = table(Acceleration,Cylinders,Displacement,Horsepower,Model_Year,Weight);
Train a blackbox model of MPG by using the TreeBagger function.
rng('default') % For reproducibility Mdl = TreeBagger(100,tbl,MPG,'Method','regression','CategoricalPredictors',[2 5]);
lime does not support a TreeBagger object directly, so you cannot specify the first input argument (blackbox model) of lime as a TreeBagger object. Instead, you can use a function handle to the predict function. You can also specify options of the predict function using name-value arguments of the function.
Create the function handle to the predict function of the TreeBagger object Mdl. Specify the array of tree indices to use as 1:50.
myPredict = @(tbl) predict(Mdl,tbl,'Trees',1:50);Create a lime object using the function handle myPredict. When you specify a blackbox model as a function handle, you must provide the predictor data and specify the 'Type' name-value argument. tbl includes categorical predictors (Cylinder and Model_Year) with the double data type. By default, lime does not treat variables with the double data type as categorical predictors. Specify the second (Cylinder) and fifth (Model_Year) variables as categorical predictors.
results = lime(myPredict,tbl,'Type','regression','CategoricalPredictors',[2 5]);
Fit a linear simple model for the first observation in tbl.
results = fit(results,tbl(1,:),4); plot(results)

lime finds Horsepower, Displacement, Cylinders, and Model_Year as important predictors.
More About
A distance metric is a function that defines a distance between two
observations. lime supports various distance metrics for
continuous variables and a mix of continuous and categorical variables.
Distance metrics for continuous variables
Given an mx-by-n data matrix X, which is treated as mx (1-by-n) row vectors x1, x2, ..., xmx, and an my-by-n data matrix Y, which is treated as my (1-by-n) row vectors y1, y2, ...,ymy, the various distances between the vector xs and yt are defined as follows:
Euclidean distance
The Euclidean distance is a special case of the Minkowski distance, where p = 2.
Standardized Euclidean distance
where V is the n-by-n diagonal matrix whose jth diagonal element is (S(j))2, where S is a vector of scaling factors for each dimension.
Mahalanobis distance
where C is the covariance matrix.
City block distance
The city block distance is a special case of the Minkowski distance, where p = 1.
Minkowski distance
For the special case of p = 1, the Minkowski distance gives the city block distance. For the special case of p = 2, the Minkowski distance gives the Euclidean distance. For the special case of p = ∞, the Minkowski distance gives the Chebychev distance.
Chebychev distance
The Chebychev distance is a special case of the Minkowski distance, where p = ∞.
Cosine distance
Correlation distance
where
and
Spearman distance is one minus the sample Spearman's rank correlation between observations (treated as sequences of values):
where
Distance metrics for a mix of continuous and categorical variables
Modified Goodall distance
This distance is a variant of the Goodall distance, which assigns a small distance if the matching values are infrequent regardless of the frequencies of the other values. For mismatches, the distance contribution of the predictor is 1/(number of variables).
Occurrence frequency distance
For a match, the occurrence frequency distance assigns zero distance. For a mismatch, the occurrence frequency distance assigns a higher distance on a less frequent value and a lower distance on a more frequent value.
Algorithms
To explain a prediction of a machine learning model using LIME [1], the software generates a synthetic data
set and fits a simple interpretable model to the synthetic data set by using
lime and fit, as described in steps
1–5.
If you specify the
queryPointandnumImportantPredictorsvalues oflime, then thelimefunction performs all steps.If you do not specify
queryPointandnumImportantPredictorsand specify'DataLocality'as'global'(default), then thelimefunction generates a synthetic data set (steps 1–2), and thefitfunction fits a simple model (steps 3–5).If you do not specify
queryPointandnumImportantPredictorsand specify'DataLocality'as'local', then thefitfunction performs all steps.
The lime and fit functions perform these
steps:
Generate a synthetic predictor data set Xs using a multivariate normal distribution for continuous variables and a multinomial distribution for each categorical variable. You can specify the number of samples to generate by using the
'NumSyntheticData'name-value argument.If
'DataLocality'is'global'(default), then the software estimates the distribution parameters from the whole predictor data set (Xor predictor data inblackbox).If
'DataLocality'is'local', then the software estimates the distribution parameters using the k-nearest neighbors of the query point, where k is the'NumNeighbors'value. You can specify a distance metric to find the nearest neighbors by using the'Distance'name-value argument.
The software ignores missing values in the predictor data set when estimating the distribution parameters.
Alternatively, you can provide a pregenerated, custom synthetic predictor data set by using the
customSyntheticDatainput argument oflime.Compute the predictions Ys for the synthetic data set Xs. The predictions are predicted responses for regression or classified labels for classification. The software uses the
predictfunction of theblackboxmodel to compute the predictions. If you specifyblackboxas a function handle, then the software computes the predictions by using the function handle.Compute the distances d between the query point and the samples in the synthetic predictor data set using the distance metric specified by
'Distance'.Compute the weight values wq of the samples in the synthetic predictor data set with respect to the query point q using the squared exponential (or Gaussian) kernel function
xs is a sample in the synthetic predictor data set Xs.
d(xs,q) is the distance between the sample xs and the query point q.
p is the number of predictors in Xs.
σ is the kernel width, which you can specify by using the
'KernelWidth'name-value argument. The default'KernelWidth'value is 0.75.
The weight value at the query point is 1, and then it converges to zero as the distance value increases. The
'KernelWidth'value controls how fast the weight value converges to zero. The lower the'KernelWidth'value, the faster the weight value converges to zero. Therefore, the algorithm gives more weight to samples near the query point. Because this algorithm uses such weight values, the selected important predictors and fitted simple model effectively explain the predictions for the synthetic data locally, around the query point.Fit a simple model.
If
'SimpleModelType'is'linear'(default), then the software selects important predictors and fits a linear model of the selected important predictors.Select n important predictors () by using the group orthogonal matching pursuit (OMP) algorithm [2][3], where n is the
numImportantPredictorsvalue. This algorithm uses the synthetic predictor data set (Xs), predictions (Ys), and weight values (wq).Note that the software does not select unimportant predictors (with estimated 0 coefficients), so the number of selected important predictors might be less than n.
Fit a linear model of the selected important predictors () to the predictions (Ys) using the weight values (wq). The software uses
fitrlinearfor regression orfitclinearfor classification. For a multiclass model, the software uses the one-versus-all scheme to construct a binary classification problem. The positive class is the predicted class for the query point from theblackboxmodel, and the negative class refers to the other classes.
If
'SimpleModelType'is'tree', then the software fits a decision tree model by usingfitrtreefor regression orfitctreefor classification. The software specifies the maximum number of decision splits (or branch nodes) as the number of important predictors so that the fitted decision tree uses at most the specified number of predictors.
References
[1] Ribeiro, Marco Tulio, S. Singh, and C. Guestrin. "'Why Should I Trust You?': Explaining the Predictions of Any Classifier." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. San Francisco, California: ACM, 2016.
[2] Świrszcz, Grzegorz, Naoki Abe, and Aurélie C. Lozano. "Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction." Advances in Neural Information Processing Systems (2009): 1150–58.
[3] Lozano, Aurélie C., Grzegorz Świrszcz, and Naoki Abe. "Group Orthogonal Matching Pursuit for Logistic Regression." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (2011): 452–60.
Version History
Introduced in R2020bIf you compute LIME values by fitting a linear simple model (see
SimpleModelType), you can specify the relative tolerance on the
linear coefficients and the bias term of the simple model by using the
BetaTolerance name-value argument. The default
BetaTolerance value is 1e-4.
In R2023a, the relative tolerance on the linear coefficients of the linear simple model
was 1e-8. In previous releases, the value was
1e-4.
See Also
plotPartialDependence | shapley
Topics
- Interpret Deep Network Predictions on Tabular Data Using LIME (Deep Learning Toolbox)
- Interpret Machine Learning Models
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)