# fit

## Syntax

## Description

The incremental `fit`

function fits an incremental principal
component analysis (PCA) object (`incrementalPCA`

) to
streaming data.

returns an incremental PCA model `IncrementalMdl`

= fit(`IncrementalMdl`

,`X`

)`IncrementalMdl`

, which represents the
input incremental PCA model `IncrementalMdl`

fit using the predictor data
`X`

. Specifically, the incremental `fit`

function fits the model to the incoming data and stores the updated PCA properties in the
output model `IncrementalMdl`

.

also sets the observation weights `IncrementalMdl`

= fit(`IncrementalMdl`

,`X`

,Weights=`weights`

)`weights`

.

`[`

additionally returns the principal component scores `IncrementalMdl`

,`Xtransformed`

] = fit(`IncrementalMdl`

,`X`

)`Xtransformed`

.

## Examples

### Perform Incremental Principal Component Analysis Using Initial Model

Perform principal component analysis (PCA) on an initial data chunk, and then create an incremental PCA model that incorporates the results of the analysis. Fit the incremental model to streaming data and analyze how the model evolves during training.

**Load and Preprocess Data**

Load the human activity data set.

`load humanactivity`

For details on the human activity data set, enter Description at the command line.

The data set includes observations containing 60 variables. To simulate streaming data, split the data set into an initial chunk of 1000 observations and a second chunk of 10,000 observations.

Xinitial = feat(1:1000,:); Xstream = feat(1001:11000,:);

**Perform Initial PCA**

Perform PCA on the initial data chunk by using the `pca`

function. Specify to center the data and keep 10 principal components. Return the principal component coefficients (`coeff`

), principal component variances (`latent`

), and estimated means of the variables (`mu`

).

[coeff,~,latent,~,~,mu]=pca(Xinitial,Centered=true,NumComponents=10);

**Create Incremental PCA Model**

Create a model for incremental PCA that incorporates the PCA results from the initial data chunk.

```
IncrementalMdl = incrementalPCA(Coefficients=coeff,Latent=latent, ...
Means=mu,NumObservations=1000);
details(IncrementalMdl)
```

incrementalPCA with properties: IsWarm: 1 NumTrainingObservations: 0 WarmupPeriod: 0 Mu: [0.7764 0.4931 -0.3407 0.1108 0.0707 0.0485 0.3931 -1.1100 0.0646 0.1703 -1.1020 0.0283 0.0836 -1.0797 0.0139 0.9328 1.2892 1.6731 2.0729 2.5181 2.9511 0.0128 0.0062 0.0039 0.0027 0.0020 0.0016 0.9322 ... ] (1x60 double) Sigma: [] ExplainedVariance: [10x1 double] EstimationPeriod: 0 Latent: [10x1 double] Coefficients: [60x10 double] VariableWeights: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] NumComponents: 10 NumPredictors: 60

`IncrementalMdl`

is an `incrementalPCA`

model object. All its properties are read-only. Because `Coefficients`

and `Latent`

are specified, the model is warm, meaning that the `fit`

function returns transformed observations.

**Fit Incremental Model**

Fit the incremental model `IncrementalMdl`

to the data by using the `fit`

function. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:

Process 100 observations.

Overwrite the previous incremental model with a new one fitted to the incoming observations.

Store

`topEV`

, the explained variance value of the component with the highest variance, to see how it evolves during incremental fitting.

n = numel(Xstream(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); topEV = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); IncrementalMdl = fit(IncrementalMdl,Xstream(ibegin:iend,:)); topEV(j) = IncrementalMdl.ExplainedVariance(1); end

`IncrementalMdl`

is an `incrementalPCA`

model object fitted to all the data in the stream. The `fit`

function fits the model to the data chunk and updates the model properties.

**Analyze Incremental Model During Training**

Plot the explained variance value of the component with the highest variance to see how it evolves during training.

figure plot(topEV,".-") ylabel("topEV") xlabel("Iteration") xlim([0 nchunk])

The highest explained variance value is 33% after the first iteration, and rapidly rises to 80% after five iterations. The value then gradually approaches 97%.

### Perform Incremental Principal Component Analysis Without Prior Information

Create a model for incremental principal component analysis (PCA) and specify to standardize the data.

IncrementalMdl = incrementalPCA(StandardizeData=true); details(IncrementalMdl)

incrementalPCA with properties: IsWarm: 0 NumTrainingObservations: 0 WarmupPeriod: 1000 Mu: [] Sigma: [] ExplainedVariance: [0x1 double] EstimationPeriod: 1000 Latent: [0x1 double] Coefficients: [] VariableWeights: [1x0 double] NumComponents: 0 NumPredictors: 0

`IncrementalMdl`

is an `incrementalPCA`

model object. All its properties are read-only. By default, the software sets the hyperparameter estimation period and the warm-up period to 1000 observations. The model must be warm before the incremental `fit`

function outputs transformed data.

**Load and Preprocess Data**

Load the NYCHousing2015 sample data set.

`load NYCHousing2015`

The data set includes 10 variables with information on the sales of properties in New York City in 2015.

Preprocess the data set. Remove the categorical variables `BOROUGH`

, `NEIGHBORHOOD`

and `BUILDINGCLASSCATEGORY`

. Convert the `datetime`

array (`SALEDATE`

) to month numbers and change zeros in `LANDSQUAREFEET`

, `GROSSSQUAREFEET`

, `SALEPRICE`

, and `YEARBUILT`

to `NaN`

s.

NYCHousing2015 = removevars(NYCHousing2015,["BOROUGH", ... "NEIGHBORHOOD","BUILDINGCLASSCATEGORY"]); NYCHousing2015.SALEDATE = month(NYCHousing2015.SALEDATE); NYCHousing2015.LANDSQUAREFEET(NYCHousing2015.LANDSQUAREFEET == 0) = NaN; NYCHousing2015.GROSSSQUAREFEET(NYCHousing2015.GROSSSQUAREFEET == 0) = NaN; NYCHousing2015.SALEPRICE(NYCHousing2015.SALEPRICE == 0) = NaN; NYCHousing2015.YEARBUILT(NYCHousing2015.YEARBUILT == 0) = NaN;

The `fit`

function of `incrementalPCA`

does not use observations that contain a missing value. Remove these observations from the data set.

NYCHousing2015=rmmissing(NYCHousing2015);

The `incrementalPCA`

functions do not accept data in table format. Convert the data set to array format and keep only the first 5000 observations.

streamingData = table2array(NYCHousing2015(1:end,:)); streamingData=streamingData(1:5000,:);

**Fit Incremental Models**

Fit the incremental model `IncrementalMdl`

to the data using the `fit`

function. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:

Process 100 observations.

Overwrite the previous incremental model with a new one fitted to the incoming observations.

Store

`isWarm`

, the`IsWarm`

property of`IncrementalMdl`

, to see how it evolves during incremental fitting.Store

`topEV`

, the explained variance value of the component with the highest variance, to see how it evolves during incremental fitting.Store

`meanXtr`

, the mean of the transformed data output by the`fit`

function, to see how it evolves during incremental fitting.

n = numel(streamingData(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); meanXtr = zeros(nchunk,1); isWarm = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); [IncrementalMdl,Xtr] = fit(IncrementalMdl,streamingData(ibegin:iend,:)); isWarm(j) = IncrementalMdl.IsWarm; topEV(j) = IncrementalMdl.ExplainedVariance(1); meanXtr(j)=mean(Xtr(:)); end

`IncrementalMdl`

is an `incrementalPCA`

model object fitted to all the data in the stream. `fit`

fits the model to the data chunk and outputs the transformed input data.

**Analyze Incremental Model During Training**

To see how the IsWarm indicator, the explained variance value of the component with the highest variance, and the mean of the transformed input data per chunk evolve during training, plot them on separate tiles.

figure tiledlayout(3,1); nexttile plot(isWarm,".-") ylabel("IsWarm") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(topEV,".-") ylabel("Top EV") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(meanXtr,".-") ylabel("Mean of Transformed Data") xlabel("Iteration") xlim([0 nchunk])

Because `EstimationPeriod`

= 1000, `fit`

processes 1000 observations to determine hyperparameters before updating the PCA properties of `IncrementalMdl`

. After the estimation period, the top explained variance value initially fluctuates between 58% and 85%, and then gradually approaches 50%. Because `WarmupPeriod`

= 1000, `fit`

processes an additional 1000 observations after the estimation period before `IncrementalMdl`

becomes warm and outputs transformed data. The mean of the transformed data fluctuates between –0.3 and 0.08.

## Input Arguments

`IncrementalMdl`

— Incremental PCA model

`incrementalPCA`

model object

Incremental PCA model, specified as an `incrementalPCA`

model object. You can create
`IncrementalMdl`

by calling `incrementalPCA`

directly.

`X`

— Chunk of predictor data

floating-point matrix

Chunk of predictor data, specified as a floating-point matrix of
*n* observations and `IncrementalMdl.NumPredictors`

variables. The rows of `X`

correspond to observations, and the
columns correspond to variables. The software ignores observations that contain at least
one missing value.

**Note**

If

`IncrementalMdl.NumPredictors`

= 0,`fit`

infers the number of predictors from`X`

, and sets the corresponding property of the output model. Otherwise, if the number of predictor variables in the streaming data changes from`IncrementalMdl.NumPredictors`

,`fit`

issues an error.`fit`

supports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use`dummyvar`

to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.

**Data Types: **`single`

| `double`

`weights`

— Chunk of observation weights

floating-point vector of positive values

Chunk of observation weights, specified as a floating-point vector of positive
values. `fit`

weighs the observations in
`X`

with the corresponding values in `weights`

.
The size of `weights`

must equal *n*, the number of
observations in `X`

.

By default, `weights`

is
`ones(`

.* n*,1)

**Data Types: **`single`

| `double`

## Output Arguments

`IncrementalMdl`

— Updated incremental PCA model

`incrementalPCA`

model object

Updated incremental PCA model, returned as an `incrementalPCA`

model object.

`Xtransformed`

— Principal component scores

floating-point matrix

Principal component scores, returned as a floating-point matrix. The rows of
`Xtransformed`

correspond to observations, and the columns
correspond to components. If `IncrementalMdl`

is not warm
(`IsWarm=false`

), all values of `Xtransformed`

are
returned as `NaN`

. The data type of `Xtransformed`

is the same as `X`

.

## Version History

**Introduced in R2024a**

## See Also

`incrementalPCA`

| `pca`

| `reset`

| `transform`

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)