# incrementalRobustRandomCutForest

## Description

The `incrementalRobustRandomCutForest`

function creates an
`incrementalRobustRandomCutForest`

model object, which represents a robust random cut forest
(RRCF) model for incremental anomaly detection.

Unlike other Statistics and Machine Learning Toolbox™ model objects, `incrementalRobustRandomCutForest`

can be called directly. Also,
you can specify learning options, such as the number of robust random cut trees, the
contamination fraction in the training data, and whether to standardize the predictor data
before fitting the model to data. After you create an `incrementalRobustRandomCutForest`

object, it
is prepared for incremental learning (see Incremental Learning for Anomaly Detection).

`incrementalRobustRandomCutForest`

is best suited for incremental learning. For a traditional
approach to anomaly detection when all the data is provided in advance, see `rrcforest`

.

## Creation

You can create an `incrementalRobustRandomCutForest`

model object in several ways:

**Call the function directly**— Configure incremental learning options, or specify learner-specific options, by calling`incrementalRobustRandomCutForest`

directly. This approach is best when you do not have data yet or you want to start incremental learning immediately.**Convert a traditionally trained model**— To initialize a RRCF model for incremental learning using the model parameters and hyperparameters of a trained model object, you can convert the traditionally trained model to an`incrementalRobustRandomCutForest`

model object by passing it to the`incrementalLearner`

function.**Call an incremental learning function**—`fit`

accepts a configured`incrementalRobustRandomCutForest`

model object and data as input, and returns an`incrementalRobustRandomCutForest`

model object updated with information learned from the input model and data.

### Syntax

### Description

returns an incremental RRCF model object `forest`

= incrementalRobustRandomCutForest`forest`

for anomaly detection
with default parameters. Properties of a default model contain placeholders for unknown
model parameters. You must train a default model before you can use it to detect
anomalies.

sets properties and
additional options using one or more name-value arguments. For example,
`forest`

= incrementalRobustRandomCutForest(`Name,Value`

)`incrementalRobustRandomCutForest(ContaminationFraction=0.1,ScoreWarmupPeriod=1000)`

sets the anomaly contamination fraction to `0.1`

and the score warm-up
period to `1000`

.

### Input Arguments

**Name-Value Arguments**

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`incrementalRobustRandomCutForest(StandardizeData=true)`

specifies to standardize the predictor data.

`StandardizeData`

— Flag to standardize predictor data

`false`

or `0`

(default) | `true`

or `1`

Flag to standardize the predictor data, specified as a numeric or logical `1`

(`true`

) or `0`

(`false`

).

If you set `StandardizeData=true`

, the `incrementalRobustRandomCutForest`

function centers and scales each predictor variable (`X`

or `Tbl`

) by the corresponding column mean and standard deviation. The function does not standardize the data contained in the dummy variable columns generated for categorical predictors.

**Example: **`StandardizeData=true`

**Data Types: **`logical`

`Options`

— Options for computing in parallel and setting random streams

structure

Options for computing in parallel and setting random streams, specified as a
structure. Create the `Options`

structure using `statset`

. This table lists the option fields and their
values.

Field Name | Value | Default |
---|---|---|

`UseParallel` | Set this value to `true` to run computations in
parallel. | `false` |

`UseSubstreams` | Set this value to To compute
reproducibly, set | `false` |

`Streams` | Specify this value as a `RandStream` object or
cell array of such objects. Use a single object except when the
`UseParallel` value is `true`
and the `UseSubstreams` value is
`false` . In that case, use a cell array that
has the same size as the parallel pool. | If you do not specify `Streams` , then
`incrementalRobustRandomCutForest` uses the default stream or
streams. |

**Note**

You need Parallel Computing Toolbox™ to run computations in parallel.

**Example: **`Options=statset(UseParallel=true,UseSubstreams=true,Streams=RandStream("mlfg6331_64"))`

**Data Types: **`struct`

`ScoreWarmupPeriod`

— Warm-up period before score computation and anomaly detection

`0`

(default) | nonnegative integer

Warm-up period before score computation and anomaly detection, specified as a
nonnegative integer. This option specifies the number of observations used by the
incremental `fit`

function to train the model and estimate the
score threshold.

**Note**

When processing observations during the score warm-up period, the software ignores observations that contain missing values for all predictors.

**Example: **
`ScoreWarmupPeriod=200`

**Data Types: **`single`

| `double`

`ScoreWindowSize`

— Running window size used to estimate score threshold

`1000`

(default) | positive integer

Running window size used to estimate the score threshold
(`ScoreThreshold`

), specified as a positive integer. The
default `ScoreWindowSize`

value is
`1000`

.

If `ScoreWindowSize`

is greater than the number of observations
in the training data, the software determines `ScoreThreshold`

by
subsampling from the training data. Otherwise, `ScoreThreshold`

is
set to `forest.ScoreThreshold`

.

**Example: **
`ScoreWindowSize=100`

**Data Types: **`single`

| `double`

## Properties

You can set most properties by using name-value argument syntax when you call
`incrementalRobustRandomCutForest`

directly. You can set some properties when you call
`incrementalLearner`

to convert a traditionally trained model object. You
cannot set the properties `Mu`

,
`NumTrainingObservations`

, `ScoreThreshold`

,
`Sigma`

, and `IsWarm`

.

`CategoricalPredictors`

— List of categorical predictors

vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `"all"`

| `[]`

This property is read-only.

List of categorical predictors, specified as one of the values in this table.

Value | Description |
---|---|

Vector of positive integers | Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and If |

Logical vector | A |

Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the entries
in `PredictorNames` . Pad the names
with extra blanks so each row of the character matrix has the same
length. |

String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the entries in `PredictorNames` . |

`"all"` | All predictors are categorical. |

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

| `cell`

`CollusiveDisplacement`

— Collusive displacement calculation method

`"maximal"`

(default) | `"average"`

This property is read-only.

Collusive displacement calculation method, specified as `"maximal"`

or `"average"`

.

The `incrementalRobustRandomCutForest`

function finds the maximum change
(`"maximal"`

) or the average change (`"average"`

) in
model complexity for each tree, and computes the collusive displacement (anomaly score)
for each observation.

**Data Types: **`char`

| `string`

`ContaminationFraction`

— Fraction of anomalies in training data

numeric scalar in the range `[0,1]`

This property is read-only.

Fraction of anomalies in the training data, specified as a numeric scalar in the
range `[0,1]`

.

If the

`ContaminationFraction`

value is`0`

, then`incrementalRobustRandomCutForest`

treats all training observations as normal observations, and sets the`ScoreThreshold`

value to the maximum anomaly score value of the training data.If the

`ContaminationFraction`

value is in the range (`0`

,`1`

], then`incrementalRobustRandomCutForest`

determines the`ScoreThreshold`

value so that the function detects the specified fraction of training observations as anomalies.

The default `ContaminationFraction`

value depends on how you
create the model:

If you convert a traditionally trained model to create

`forest`

, then`ContaminationFraction`

is specified by the corresponding property of the traditionally trained model.If you create

`forest`

by calling`incrementalRobustRandomCutForest`

directly, then you can specify`ContaminationFraction`

by using name-value argument syntax. If you do not specify the value, then the default value is`0`

.

**Data Types: **`single`

| `double`

`EstimationPeriod`

— Number of observations processed to estimate hyperparameters

nonnegative integer

This property is read-only.

Number of observations processed by the incremental learner to estimate hyperparameters before training, specified as a nonnegative integer.

When processing observations during the estimation period, the software ignores observations that have missing values for all predictors.

If you specify a positive

`EstimationPeriod`

and`StandardizeData`

is`false`

,`incrementalRobustRandomCutForest`

forces`EstimationPeriod`

to`0`

.If

`forest`

is prepared for incremental learning (all hyperparameters required for training are specified),`incrementalRobustRandomCutForest`

forces`EstimationPeriod`

to`0`

.If

`forest`

is not prepared for incremental learning and`StandardizeData`

is`true`

,`incrementalRobustRandomCutForest`

sets`EstimationPeriod`

to`1000`

and estimates the unknown hyperparameters.

For more details, see Estimation Period.

**Data Types: **`single`

| `double`

`IsWarm`

— Flag indicating whether `fit`

returns scores and detects anomalies

`false`

or `0`

| `true`

or `1`

This property is read-only.

Flag indicating whether the incremental fitting function `fit`

returns
scores and detects anomalies after training the model, specified as a numeric or logical
`0`

(`false`

) or `1`

(`true`

).

The incremental model `forest`

is *warm*
(`IsWarm`

becomes `true`

) after the
`fit`

function fits the incremental model to
`ScoreWarmupPeriod`

observations.

You cannot specify `IsWarm`

directly.

**Data Types: **`logical`

`Mu`

— Predictor means

numeric vector | `[]`

This property is read-only.

Predictor means of the training data, specified as a numeric vector.

If you specify

`StandardizeData=true`

:The length of

`Mu`

is equal to the number of predictors.If you set

`StandardizeData=false`

, then`Mu`

is an empty vector (`[]`

).

You cannot specify `Mu`

directly.

**Data Types: **`single`

| `double`

`NumLearners`

— Number of robust random cut trees

100 (default) | positive integer scalar

This property is read-only.

Number of robust random cut trees (trees in the RRCF model), specified as a positive integer scalar.

**Data Types: **`single`

| `double`

`NumObservationsPerLearner`

— Number of observations for each robust random cut tree

`min(N,256)`

where `N`

is the
number of training observations (default) | positive integer scalar greater than or equal to 3

This property is read-only.

Number of observations to draw from the training data without replacement for each robust random cut tree (tree in the RRCF model), specified as a positive integer scalar greater than or equal to 3.

**Data Types: **`single`

| `double`

`NumObservationsToKeep`

— Size of historical data

value of `NumObservationsPerLearner`

(default) | positive integer scalar

This property is read-only.

Size of historical data that pertains to the RRCF model's knowledge, specified as a positive integer scalar.

**Data Types: **`single`

| `double`

`NumPredictors`

— Number of predictor variables

nonnegative numeric scalar

This property is read-only.

Number of predictor variables, specified as a nonnegative numeric scalar.

The default `NumPredictors`

value depends on how you create the model:

If you convert a traditionally trained model to create

`forest`

,`NumPredictors`

is specified by the corresponding property of the traditionally trained model.If you create

`forest`

by calling`incrementalRobustRandomCutForest`

directly, you can specify`NumPredictors`

by using name-value argument syntax. If you do not specify the value, then the default value is`0`

, and incremental fitting functions infer`NumPredictors`

from the predictor data during training.

**Data Types: **`double`

`NumTrainingObservations`

— Number of observations fit to incremental model

`0`

(default) | nonnegative numeric scalar

This property is read-only.

Number of observations fit to the incremental model `forest`

,
specified as a nonnegative numeric scalar. `NumTrainingObservations`

increases when you pass `forest`

and training data to
`fit`

outside of the estimation period.

When fitting the model, the software ignores observations that have missing values for all predictors.

If you convert a traditionally trained model to create

`forest`

,`incrementalRobustRandomCutForest`

does not add the number of observations fit to the traditionally trained model to`NumTrainingObservations`

.

You cannot specify `NumTrainingObservations`

directly.

**Data Types: **`double`

`ObservationRemoval`

— Observation removal method

`"oldest"`

(default) | `"timedecaying"`

| `"random"`

Observation removal method, specified as `"oldest"`

,
`"timedecaying"`

, or `"random"`

. When the robust
random cut trees reach their capacity, the software removes old observations to
accommodate the most recent data.

Value | Description |
---|---|

| Oldest observations are removed first. |

| Observations are removed randomly in a weighted fashion. Older observations have a higher probability of being removed first. |

| Observations are removed in random order. |

**Data Types: **`string`

| `char`

`PredictorNames`

— Predictor variable names

string array of unique names | cell array of unique character vectors

This property is read-only.

Predictor variable names, specified as a string array of unique names or cell array of
unique character vectors. The functionality of `PredictorNames`

depends
on how you supply the predictor data.

If you supply

`Tbl`

, then you can use`PredictorNames`

to specify which predictor variables to use. That is,`incrementalRobustRandomCutForest`

uses only the predictor variables in`PredictorNames`

.`PredictorNames`

must be a subset of`Tbl.Properties.VariableNames`

.By default,

`PredictorNames`

contains the names of all predictor variables in`Tbl`

.

If you supply

`X`

, then you can use`PredictorNames`

to assign names to the predictor variables in`X`

.The order of the names in

`PredictorNames`

must correspond to the column order of`X`

. That is,`PredictorNames{1}`

is the name of`X(:,1)`

,`PredictorNames{2}`

is the name of`X(:,2)`

, and so on. Also,`size(X,2)`

and`numel(PredictorNames)`

must be equal.By default,

`PredictorNames`

is`{"x1","x2",...}`

.

**Data Types: **`string`

| `cell`

`ScoreThreshold`

— Threshold for anomaly score

nonnegative integer

This property is read-only.

Threshold for the anomaly score used to detect anomalies, specified as a nonnegative
integer. `incrementalRobustRandomCutForest`

detects observations with scores above the
threshold as anomalies.

The default `ScoreThreshold`

value depends on how you create the model:

If you convert a traditionally trained model object to create

`forest`

, then`ScoreThreshold`

is specified by the corresponding property value of the object.Otherwise, the default value is

`0`

.

`ScoreThreshold`

has the value 0 until the number of observations
reaches the `ScoreWarmupPeriod`

value. After that, the software
updates the `ScoreThreshold`

with every new observation.

You cannot specify `ScoreThreshold`

directly.

**Data Types: **`single`

| `double`

`ScoreWarmupPeriod`

— Warm-up period before score computation and anomaly detection

nonnegative integer

This property is read-only.

Warm-up period before score computation and anomaly detection, specified as a
nonnegative integer. This value is the number of observations used by the incremental
`fit`

function to train the model and estimate the score
threshold.

When processing observations during the score warm-up period, the software ignores observations that have missing values for all predictors.

You can return scores and detect anomalies during the warm-up period by calling

`isanomaly`

directly.

The default `ScoreWarmupPeriod`

value depends on how you create
the model:

If you convert a traditionally trained model to create

`forest`

, the`ScoreWarmupPeriod`

name-value argument of the`incrementalLearner`

function sets this property.Otherwise, the default value is

`0`

.

**Data Types: **`single`

| `double`

`ScoreWindowSize`

— Running window size for `ScoreThreshold`

estimation

nonnegative integer

This property is read-only.

Running window size for `ScoreThreshold`

estimation, specified as
a nonnegative integer. The software estimates the `ScoreThreshold`

value over a running window with a window size of
`ScoreWindowSize`

.

The default `ScoreWindowSize`

value depends on how you create the model:

If you convert a traditionally trained model to create

`forest`

, the`ScoreWindowSize`

name-value argument of the`incrementalLearner`

function sets this property.Otherwise, the default value is

`1000`

.

**Data Types: **`double`

`Sigma`

— Predictor standard deviations

numeric vector | `[]`

This property is read-only.

Predictor standard deviations of the training data, specified as a numeric vector.

If you specify

`StandardizeData=true`

when you train an incremental RRCF model using`fit`

:The

`fit`

function does not standardize columns that contain categorical variables. The elements in`Sigma`

for categorical variables contain`NaN`

values.The

`isanomaly`

function standardizes the input data by using the predictor means in`Mu`

and standard deviations in`Sigma`

.

The length of

`Sigma`

is equal to the number of predictors.If you set

`StandardizeData=false`

, then`Sigma`

is an empty vector (`[]`

).

You cannot specify `Sigma`

directly.

## Object Functions

## Examples

### Create Incremental Anomaly Detector Without Any Prior Information

Create a default robust random cut forest model for incremental anomaly detection.

forest = incrementalRobustRandomCutForest; details(forest)

incrementalRobustRandomCutForest with properties: CollusiveDisplacement: 'maximal' NumLearners: 100 NumObservationsPerLearner: 256 ObservationRemoval: 'oldest' NumObservationsToKeep: 256 Mu: [] Sigma: [] CategoricalPredictors: [] EstimationPeriod: 0 IsWarm: 0 ContaminationFraction: 0 NumTrainingObservations: 0 NumPredictors: 0 ScoreThreshold: 0 ScoreWarmupPeriod: 0 PredictorNames: {} ScoreWindowSize: 1000

`forest`

is an `incrementalRobustRandomCutForest`

model object. All its properties are read-only. By default, the software sets the anomaly contamination fraction to 0 and the score warm-up period to 0. `forest`

must be fit to data before you can use it to perform any other operations.

**Load Data**

Load the human activity data set and keep only the first 3000 observations. For details on the data set, enter `Description`

at the command line.

```
load humanactivity.mat
feat = feat(1:3000,:);
```

**Fit Incremental Model and Detect Anomalies**

Fit the incremental model `forest`

to the data by using the `fit`

function. Because `ScoreWarmupPeriod`

= `0`

, `fit`

returns scores and detects anomalies immediately after fitting the model for the first time. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:

Process 100 observations.

Overwrite the previous incremental model with a new one fitted to the incoming observations.

Store

`medianscore`

, the median score value of the data chunk, to see how it evolves during incremental learning.Store

`allscores`

, the score values for the fitted observations.Store

`threshold`

, the score threshold value for anomalies, to see how it evolves during incremental learning.Store

`numAnom`

, the number of detected anomalies in the data chunk.

n = numel(feat(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); medianscore = zeros(nchunk,1); threshold = zeros(nchunk,1); numAnom = zeros(nchunk,1); allscores = []; % Incremental fitting rng(0,"twister"); % For reproducibility for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; forest = fit(forest,feat(idx,:)); [isanom,scores] = isanomaly(forest,feat(idx,:)); medianscore(j) = median(scores); allscores = [allscores scores']; numAnom(j) = sum(isanom); threshold(j) = forest.ScoreThreshold; end

`forest`

is an `incrementalRobustRandomCutForest`

model object trained on all the data in the stream. The `fit`

function fits the model to the data chunk, and the `isanomaly`

function returns the observation scores and the indices of observations in the data chunk with scores above the score threshold value.

**Analyze Incremental Model During Training**

Plot the anomaly score for every observation.

plot(allscores,".-") xlabel("Observation") ylabel("Score")

At each iteration, the software calculates a score value for each observation in the data chunk. A low score value indicates a normal observation, and a high score value indicates an anomaly.

To see how the score threshold and median score per data chunk evolve during training, plot them on separate tiles.

figure tiledlayout(2,1); nexttile plot(medianscore,".-") ylabel("Median Score") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(threshold,".-") ylabel("Score Threshold") xlabel("Iteration") xlim([0 nchunk])

finalScoreThreshold=forest.ScoreThreshold

finalScoreThreshold = 93.7052

The median score fluctuates between 4 and 20. The anomaly score threshold has a value of 20 after the first iteration and steadily approaches a value of 94 by the 22nd iteration. Because `ContaminationFraction`

= 0, `incrementalRobustRandomCutForest`

treats all training observations as normal observations, and at each iteration sets the score threshold to the maximum score value in the data chunk.

totalAnomalies = sum(numAnom)

totalAnomalies = 0

No anomalies are detected at any iteration, because `ContaminationFraction`

= 0.

### Configure Incremental Learning Options and Analyze Model During Training

Prepare an incremental robust random cut forest model by specifying an anomaly contamination fraction of 0.001, and standardize the data using an initial estimation period of 500 observations. Specify a score warm-up period of 1000 observations, during which the `fit`

function updates the score threshold and trains the model but does not return scores or identify anomalies.

```
forest = incrementalRobustRandomCutForest(ContaminationFraction=0.001, ...
StandardizeData=true,ScoreWarmupPeriod=1000,EstimationPeriod=500);
```

`forest`

is an `incrementalRobustRandomCutForest`

model object. All its properties are read-only. `forest`

must be fit to data before you can use it to perform any other operations.

**Load Data**

Load the credit rating data stored in `CreditRating_Historical.dat`

. Remove the ID column and the categorical variables.

creditrating = readtable("CreditRating_Historical.dat"); creditrating = removevars(creditrating,["ID","Industry","Rating"]);

The `fit`

function of `incrementalRobustRandomCutForest`

does not use observations with missing values. Remove missing values in the data sets to reduce memory consumption and speed up training.

creditrating = rmmissing(creditrating);

**Fit Incremental Model and Detect Anomalies**

Fit the incremental model `Mdl`

to the data by using the `fit`

function. To simulate a data stream, fit the model in chunks of 100 observations at a time. Because `EstimationPeriod`

= `500`

and `ScoreWarmupPeriod`

= `1000`

, `fit`

only returns scores and detects anomalies after 15 iterations. At each iteration:

Process 100 observations.

Overwrite the previous incremental model with a new one fitted to the incoming observations.

Store

`meanscore`

, the mean score value of the data chunk, to see how it evolves during incremental learning.Store

`threshold`

, the score threshold value for anomalies, to see how it evolves during incremental learning.Store

`numAnom`

, the number of detected anomalies in the chunk, to see how it evolves during incremental learning.

n = numel(creditrating(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); meanscore = zeros(nchunk,1); threshold = zeros(nchunk,1); numAnom = zeros(nchunk,1); % Incremental fitting rng(0,"twister"); % For reproducibility for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; [forest,tf,scores] = fit(forest,creditrating(idx,:)); meanscore(j) = mean(scores); numAnom(j) = sum(tf); threshold(j) = forest.ScoreThreshold; end

`forest`

is an `incrementalRobustRandomCutForest`

model object trained on all the data in the stream.

**Analyze Incremental Model During Training**

To see how the mean score, score threshold and number of detected anomalies per chunk evolve during training, plot them on separate tiles.

tiledlayout(3,1); nexttile plot(meanscore) ylabel("Mean Score") xlabel("Iteration") xlim([0 nchunk]) xline(forest.EstimationPeriod/numObsPerChunk,"r-.") xline((forest.EstimationPeriod+forest.ScoreWarmupPeriod)/numObsPerChunk,"r") nexttile plot(threshold) ylabel("Score Threshold") xlabel("Iteration") xlim([0 nchunk]) xline(forest.EstimationPeriod/numObsPerChunk,"r-.") xline((forest.EstimationPeriod+forest.ScoreWarmupPeriod)/numObsPerChunk,"r") nexttile plot(numAnom,"+") ylabel("Anomalies") xlabel("Iteration") xlim([0 nchunk]) ylim([0 max(numAnom)+0.2]) xline(forest.EstimationPeriod/numObsPerChunk,"r-.") xline((forest.EstimationPeriod+forest.ScoreWarmupPeriod)/numObsPerChunk,"r")

During the estimation period, `fit`

estimates means and standard deviations using the observations, and does not fit the model or update the score threshold. During the warm-up period, `fit`

fits the model and updates the score threshold, but returns all scores as `NaN`

and all anomaly values as `false`

. After the warm-up period, `fit`

returns the observation scores and the indices of observations with scores above the score threshold value. A small score value indicates a normal observation, and a large score value indicates an anomaly.

totalAnomalies=sum(numAnom)

totalAnomalies = 3

anomfrac= totalAnomalies/(n-forest.EstimationPeriod-forest.ScoreWarmupPeriod)

anomfrac = 0.0012

The software detects 3 anomalies after the warm-up and estimation periods. The contamination fraction after the estimation and warm-up periods is approximately 0.001.

## More About

### Incremental Learning for Anomaly Detection

*Incremental learning*, or *online learning*, is a branch of machine learning concerned with processing incoming data from a data stream, possibly given little to no knowledge of the distribution of the predictor variables, aspects of the prediction or objective function (including tuning parameter values), or whether the observations contain anomalies. Incremental learning differs from traditional machine learning, where enough data is available to fit to a model, perform cross-validation to tune hyperparameters, and infer the predictor distribution.

Anomaly detection is used to identify unexpected events and departures from normal
behavior. In situations where the full data set is not immediately available, or new data is
arriving, you can use *incremental learning for anomaly detection* to
incrementally train a model so it adjusts to the characteristics of the incoming
data.

Given incoming observations, an incremental learning model for anomaly detection does the following:

Computes anomaly scores

Updates the anomaly score threshold

Detects data points above the score threshold as anomalies

Fits the model to the incoming observations

For more information, see Incremental Anomaly Detection with MATLAB.

## Algorithms

### Estimation Period

During the estimation period, the incremental fitting function `fit`

does not fit
the model. The function uses the first incoming `EstimationPeriod`

observations
to estimate the predictor means (`Mu`

) and standard deviations (`Sigma`

). At the end of the
estimation period, the function updates the properties that store the
hyperparameters.

Estimation occurs only when:

`EstimationPeriod`

is positive.`forest.Mu`

and`forest.Sigma`

are empty arrays`[]`

.Incremental fitting functions are configured to standardize predictor data (see Standardize Data).

**Note**

If you specify a positive `EstimationPeriod`

and
`StandardizeData`

is `false`

, then
`EstimationPeriod`

is reset to 0.

### Standardize Data

If incremental learning functions are configured to standardize predictor variables,
they do so using the means and standard deviations stored in the `Mu`

and
`Sigma`

properties of the incremental learning model
`forest`

.

When you set

`StandardizeData=true`

and a positive estimation period (see`EstimationPeriod`

), and`forest.Mu`

and`forest.Sigma`

are empty, the incremental fit function estimates means and standard deviations using the estimation period observations.When the incremental fitting function estimates predictor means and standard deviations, the function computes weighted means and weighted standard deviations using the estimation period observations. Specifically, the function standardizes predictor

*j*(*x*) using_{j}$${x}_{j}^{\ast}=\frac{{x}_{j}-{\mu}_{j}^{\ast}}{{\sigma}_{j}^{\ast}}.$$

*x*is predictor_{j}*j*, and*x*is observation_{jk}*k*of predictor*j*in the estimation period.$${\mu}_{j}^{\ast}=\frac{1}{{\displaystyle \sum _{k}{w}_{k}}}{\displaystyle \sum _{k}{w}_{k}{x}_{jk}}.$$

$${\left({\sigma}_{j}^{\ast}\right)}^{2}=\frac{1}{{\displaystyle \sum _{k}{w}_{k}}}{\displaystyle \sum _{k}{w}_{k}{\left({x}_{jk}-{\mu}_{j}^{\ast}\right)}^{2}}.$$

*w*is observation weight_{j}*j*.The observation weights

*w*are all equal to one and cannot be specified._{j}

## References

[1] Guha, Sudipto, N. Mishra, G. Roy, and O. Schrijvers. "Robust Random Cut Forest Based Anomaly Detection on Streams," *Proceedings of The 33rd International Conference on Machine Learning* 48 (June 2016): 2712–21.

[2] Bartos, Matthew D., A. Mullapudi, and S. C. Troutman. "rrcf: Implementation of the Robust Random Cut Forest Algorithm for Anomaly Detection on Streams." *Journal of Open Source Software* 4, no. 35 (2019): 1336.

## Extended Capabilities

### Automatic Parallel Support

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, specify the `Options`

name-value argument in the call to
this function and set the `UseParallel`

field of the
options structure to `true`

using
`statset`

:

`Options=statset(UseParallel=true)`

For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

## Version History

**Introduced in R2023b**

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)