Main Content

# nnmf

Nonnegative matrix factorization

## Syntax

``[W,H] = nnmf(A,k)``
``[W,H] = nnmf(A,k,Name,Value)``
``[W,H,D] = nnmf(___)``

## Description

example

````[W,H] = nnmf(A,k)` factors the n-by-m matrix `A` into nonnegative factors `W` (n-by-`k`) and `H` (`k`-by-m). The factorization is not exact; `W*H` is a lower-rank approximation to `A`. The factors `W` and `H` minimize the root mean square residual `D` between `A` and `W*H`. D = norm(A - W*H,'fro')/sqrt(n*m) The factorization uses an iterative algorithm starting with random initial values for `W` and `H`. Because the root mean square residual `D` might have local minima, repeated factorizations might yield different `W` and `H`. Sometimes the algorithm converges to a solution of lower rank than k, which can indicate that the result is not optimal.```

example

````[W,H] = nnmf(A,k,Name,Value)` modifies the factorization using one or more name-value pair arguments. For example, you can request repeated factorizations by setting `'Replicates'` to an integer value greater than 1.```
````[W,H,D] = nnmf(___)` also returns the root mean square residual `D` using any of the input argument combinations in the previous syntaxes.```

## Examples

collapse all

Load the sample data.

`load fisheriris`

Compute a nonnegative rank-two approximation of the measurements of the four variables in Fisher's iris data.

```rng(1) % For reproducibility [W,H] = nnmf(meas,2); H```
```H = 2×4 0.6945 0.2856 0.6220 0.2218 0.8020 0.5683 0.1834 0.0149 ```

The first and third variables in `meas` (sepal length and petal length, with coefficients 0.6945 and 0.6220, respectively) provide relatively strong weights to the first column of `W` . The first and second variables in `meas` (sepal length and sepal width, with coefficients 0.8020 and 0.5683, respectively) provide relatively strong weights to the second column of `W` .

Create a `biplot` of the data and the variables in `meas` in the column space of `W` .

```biplot(H','Scores',W,'VarLabels',{'sl','sw','pl','pw'}); axis([0 1.1 0 1.1]) xlabel('Column 1') ylabel('Column 2')``` Starting from a random array `X` with rank 20, try a few iterations at several replicates using the multiplicative algorithm.

```rng default % For reproducibility X = rand(100,20)*rand(20,50); opt = statset('MaxIter',5,'Display','final'); [W0,H0] = nnmf(X,5,'Replicates',10,... 'Options',opt,... 'Algorithm','mult');```
``` rep iteration rms resid |delta x| 1 5 0.560887 0.0245182 2 5 0.66418 0.0364471 3 5 0.609125 0.0358355 4 5 0.608894 0.0415491 5 5 0.619291 0.0455135 6 5 0.621549 0.0299965 7 5 0.640549 0.0438758 8 5 0.673015 0.0366856 9 5 0.606835 0.0318931 10 5 0.633526 0.0319591 Final root mean square residual = 0.560887 ```

Continue with more iterations from the best of these results using alternating least squares.

```opt = statset('Maxiter',1000,'Display','final'); [W,H] = nnmf(X,5,'W0',W0,'H0',H0,... 'Options',opt,... 'Algorithm','als');```
``` rep iteration rms resid |delta x| 1 24 0.257336 0.00271859 Final root mean square residual = 0.257336 ```

## Input Arguments

collapse all

Matrix to factorize, specified as a real matrix.

Example: `rand(20,30)`

Data Types: `single` | `double`

Rank of factors, specified as a positive integer. The resulting factors `W` and `H` have `k` columns and rows, respectively.

Example: 3

Data Types: `single` | `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `[W,H] = nnmf(A,k,'Algorithm','mult','Replicates',10)` chooses the multiplicative update algorithm and ten replicates to improve the result

Factorization algorithm, specified as the comma-separated pair consisting of `'Algorithm'` and `'als'` (alternating least squares) or `'mult'` (a multiplicative update algorithm).

The `'als'` algorithm typically is more stable and converges in fewer iterations. Each iteration takes longer. Therefore, the default maximum is 50, which usually gives satisfactory results in internal testing.

The `'mult'` algorithm typically has faster iterations and requires more of them. The default maximum is 100. This algorithm tends to be more sensitive to starting values and, therefore, seems to benefit more from running multiple replications.

Example: `'Algorithm','mult'`

Data Types: `char` | `string`

Initial value of `W`, specified as the comma-separated pair consisting of `'W0'` and an n-by-`k` matrix, where n is the number of rows of `A`, and `k` is the second input argument of `nnmf`.

Data Types: `single` | `double`

Initial value of `H`, specified as the comma-separated pair consisting of `'H0'` and a `k`-by-m matrix, where `k` is the second input argument of `nnmf`, and m is the number of columns of `A`.

Data Types: `single` | `double`

Algorithm options, specified as the comma-separated pair consisting of `'Options'` and a structure returned by the `statset` function. `nnmf` uses the following fields of the options structure.

FieldDescriptionValues
`Display`Level of iterative display
• `'off'` (default) — No display

• `'final'` — Display of final result

• `'iter'` — Iterative display of intermediate results

`MaxIter`Maximum number of iterationsPositive integer. The default is `50` for the `'als'` algorithm and `100` for the `'mult'` algorithm. Unlike in optimization settings, reaching `MaxIter` iterations is treated as convergence.
`TolFun`Termination tolerance on the change in size of the residualNonnegative value. The default is `1e-4`.
`TolX`Termination tolerance on the relative change in the elements of `W` and `H`Nonnegative value. The default is `1e-4`.
`UseParallel`Indication to compute in parallelLogical value. The default `false` indicates not to compute in parallel, and `true` indicates to compute in parallel. Computing in parallel requires a Parallel Computing Toolbox™ license.
`UseSubstreams`Type of reproducibility when computing in parallel
• `false` (default) — Do not compute reproducibly

• `'mlfg6331_64'`

• `'mrg32k3a'`

For details, see Reproducibility in Parallel Statistical Computations.

`Streams`A `RandStream` object or cell array of such objects
• If you do not specify `Streams`, `nnmf` uses the default stream or streams.

• If `UseParallel` is `true` and `UseSubstreams` is `false`, specify a cell array of `RandStream` objects the same size as the Parallel pool. Otherwise, specify a single `RandStream` object.

Example: `'Options',statset('Display','iter','MaxIter',50)`

Data Types: `struct`

Number of times to repeat the factorization, specified as the comma-separated pair consisting of `'Replicates'` and a positive integer. The algorithm chooses new random starting values for `W` and `H` at each replication, except at the first replication if you specify `'W0'` and `'H0'`. If you specify a value greater than `1`, you can obtain better results by setting `Algorithm` to `'mult'`. See Change Algorithm.

Example: 10

Data Types: `single` | `double`

## Output Arguments

collapse all

Nonnegative left factor of `A`, returned as an n-by-`k` matrix. n is the number of rows of `A`, and `k` is the second input argument of `nnmf`.

`W` and `H` are normalized so that the rows of `H` have unit length. The columns of `W` are ordered by decreasing length.

Nonnegative right factor of `A`, returned as a `k`-by-m matrix. `k` is the second input argument of `nnmf`, and m is the number of columns of `A`.

`W` and `H` are normalized so that the rows of `H` have unit length. The columns of `W` are ordered by decreasing length.

Root mean square residual, returned as a nonnegative scalar.

`D = norm(A - W*H,'fro')/sqrt(n*m)`

 Berry, Michael W., Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. “Algorithms and Applications for Approximate Nonnegative Matrix Factorization.” Computational Statistics & Data Analysis 52, no. 1 (September 2007): 155–73. https://doi.org/10.1016/j.csda.2006.11.006.

Download ebook