# cmdscale

Classical multidimensional scaling

## Syntax

``Y = cmdscale(D)``
``Y = cmdscale(D,p)``
``[Y,e] = cmdscale(___)``

## Description

````Y = cmdscale(D)` performs classical multidimensional scaling on the `n`-by-`n` distance or dissimilarity matrix `D`, and returns an `n`-by-`p` configuration matrix. The rows of `Y` correspond to the coordinates of `n` points in a `p`-dimensional space, where `p` < `n`. When `D` is a Euclidean distance matrix, its elements are the pairwise distances between the `n` points, and `p` is the dimension of the smallest space in which these points can be embedded.When `D` is a non-Euclidean distance matrix or a dissimilarity matrix, `p` is the number of positive eigenvalues of `Y*Y'`. In this case, the reduction to `p` or fewer dimensions provides a reasonable approximation to `D` only if the negative eigenvalues of `Y*Y'` are small in magnitude.```
````Y = cmdscale(D,p)` returns a configuration matrix with the embedding dimensionality `p`, where `p` is a positive integer less than or equal to `n`. If a `p`-dimensional embedding is possible, then `Y` has size `n`-by-`p`. If only a `q`-dimensional embedding with `q` < `p` is possible, then `Y` has size `n`-by-`q`.```
````[Y,e] = cmdscale(___)` additionally returns the eigenvalues of `Y*Y'` as a numeric column vector of length `n`, using any of the input argument combinations in the previous syntaxes. If you specify `p`, then `e` has length `p`.```

example

## Examples

collapse all

Construct a map of 10 US cities based on the geographical distances between them by using the `cmdscale` function.

Create a distance matrix `D` that contains the intercity distances in miles. Because `D` is a full distance matrix, it is square and symmetric, with zeros on the diagonal and positive values off the diagonal.

```cities = ... {"Atl","Chi","Den","Hou","LA","Mia","NYC","SF","Sea","WDC"}; D = [ 0 587 1212 701 1936 604 748 2139 2182 543; 587 0 920 940 1745 1188 713 1858 1737 597; 1212 920 0 879 831 1726 1631 949 1021 1494; 701 940 879 0 1374 968 1420 1645 1891 1220; 1936 1745 831 1374 0 2339 2451 347 959 2300; 604 1188 1726 968 2339 0 1092 2594 2734 923; 748 713 1631 1420 2451 1092 0 2571 2408 205; 2139 1858 949 1645 347 2594 2571 0 678 2442; 2182 1737 1021 1891 959 2734 2408 678 0 2329; 543 597 1494 1220 2300 923 205 2442 2329 0];```

Pass the distance matrix to the `cmdscale` function, and return the configuration matrix and eigenvalues.

`[Y,e] = cmdscale(D)`
```Y = 10×6 103 × -0.7188 0.1430 0.0351 -0.0012 -0.0074 0.0015 -0.3821 -0.3408 0.0296 -0.0082 -0.0120 -0.0023 0.4816 -0.0253 0.0534 0.0013 0.0157 -0.0010 -0.1615 0.5728 0.0015 -0.0018 -0.0007 0.0027 1.2037 0.3901 -0.0186 0.0150 -0.0032 -0.0017 -1.1335 0.5819 -0.0323 -0.0024 0.0030 -0.0020 -1.0722 -0.5190 -0.0343 -0.0143 0.0064 0.0003 1.4206 0.1126 -0.0078 -0.0181 -0.0008 0.0009 1.3417 -0.5797 -0.0237 0.0060 -0.0014 0.0006 -0.9796 -0.3355 -0.0029 0.0237 0.0004 0.0010 ```
```e = 10×1 106 × 9.5821 1.6868 0.0082 0.0014 0.0005 0.0000 0.0000 -0.0009 -0.0055 -0.0355 ```

The configuration matrix `Y` is 10-by-6 because there are only six positive eigenvalues. Some eigenvalues are negative, indicating that the original distances are non-Euclidean. This result occurs because the Earth's surface is curved.

The two largest positive eigenvalues are much larger in magnitude than the others. Therefore, the first two coordinates of `Y` are sufficient for a reasonable approximation to `D`, despite the negative eigenvalues.

Calculate the maximum relative difference between `D` and an approximation that uses the largest two eigenvalues.

```Dapprox = squareform(pdist(Y(:,1:2))); MaxRelDiff = max(abs(D-Dapprox),[],"all")/max(D,[],"all")```
```MaxRelDiff = 0.0075 ```

The reconstructed distance matrix provides a good approximation of the original distance matrix.

Plot the reconstructed city locations as a map. Because D only contains information about the intercity distances, the geographical orientation of the reconstruction is arbitrary.

```scatter(Y(:,1),Y(:,2),".") text(Y(:,1)+25,Y(:,2),cities) xlabel("Miles") ylabel("Miles")```

Determine how the quality of the distance matrix approximation varies when you reduce points to distances using different metrics.

Randomly generate 10 points in 4-D space that are close to 3-D space. Apply a linear transformation to the points so that their transformed values are close to a 3-D subspace that does not align with the coordinate axes.

```rng(0,"twister"); % For reproducibility A = [normrnd(0,1,10,3),normrnd(0,0.1,10,1)]; B = randn(4,4); X = A*B;```

Create a distance matrix for the points in `X` using the default Euclidean metric.

`D = pdist(X);`

Create a configuration matrix `Y` from the interpoint distances using the `cmdscale` function.

`Y = cmdscale(D);`

Compare the quality of the reconstructions when using 2, 3, or 4 dimensions.

`maxerr2 = max(abs(pdist(X)-pdist(Y(:,1:2)))) `
```maxerr2 = 0.1631 ```
`maxerr3 = max(abs(pdist(X)-pdist(Y(:,1:3)))) `
```maxerr3 = 0.0187 ```
`maxerr4 = max(abs(pdist(X)-pdist(Y)))`
```maxerr4 = 1.1768e-14 ```

The small `maxerr3` value indicates that the first three dimensions provide a good reconstruction of the distance matrix.

Create a distance matrix for the points in `X` using the `cityblock` metric.

`D = pdist(X,"cityblock");`

Create a configuration matrix `Y` from the interpoint distances and return the eigenvalues.

`[Y,e] = cmdscale(D)`
```Y = 10×6 -6.0490 0.7237 0.0722 -0.5919 0.3802 0.1243 11.3882 -0.0401 2.0088 0.0420 -0.0079 -0.1248 -9.6056 -0.8110 -0.0974 0.7589 0.2705 0.0341 -2.3302 1.9110 0.1458 0.3004 -0.2000 -0.0681 -0.5196 -0.2462 0.0621 -0.0654 0.6354 -0.0340 -9.7676 0.0801 -0.0075 -0.4611 -0.3454 0.0470 -6.1074 -0.2389 0.0042 0.1992 -0.4226 -0.1820 1.8210 -1.7431 -0.0503 -0.2991 -0.2408 0.0089 10.9626 0.1941 -1.5058 -0.0541 0.0600 -0.4369 10.2077 0.1703 -0.6322 0.1712 -0.1295 0.6315 ```
```e = 10×1 624.6469 8.0643 6.7448 1.3965 1.0377 0.6631 -0.0000 -0.0685 -0.6633 -5.6586 ```

Evaluate the quality of the reconstruction by calculating the maximum relative difference.

`maxerr = max(abs(D-pdist(Y,"cityblock")))/max(D)`
```maxerr = 0.2019 ```

One of the eigenvalues is highly negative, which might account for the relatively poor quality of the reconstruction.

## Input Arguments

collapse all

Distance or dissimilarity matrix for `n` points, specified as an `n`-by-`n` numeric matrix. You can also specify `D` as a numeric row vector of length `k` that contains the `n*(n-1)/2` upper triangle elements of the distance or dissimilarity matrix. In this case, the software converts `D` into a square matrix using the `squareform` function. `D` can have one of the following matrix types.

Matrix TypeSymmetricDescription
Euclidean distanceYesPairwise Euclidean distances between `n` points
Non-Euclidean distanceYesNon-Euclidean pairwise distances between `n` points (see `pdist`)
Full dissimilarityYes

Zeros along the diagonal, and positive dissimilarity values off the diagonal

Full dissimilarity (upper triangle form)NoPositive dissimilarity values above the diagonal, and zeros elsewhere
Full similarityYes
• Ones along the diagonal, and values less than one off the diagonal.

• `cmdscale` transforms a similarity matrix to a dissimilarity matrix in such a way that the distances between the points in `Y` equal or approximate ```sqrt(1 – D)```. To use a different transformation, transform the similarities prior to calling `cmdscale`.

Data Types: `single` | `double`

Dimensionality of the embedding, specified as an integer between 1 and `n`, where `n` is the number of rows and columns in `D`. Specify `p` to reduce the computational burden when `n` is very large.

When you specify `p`:

• If a `p`-dimensional embedding is possible, then `Y` has size `n`-by-`p`.

• If only a `q`-dimensional embedding with `q` < `p` is possible, then `Y` has size `n`-by-`q`.

Data Types: `single` | `double`

## Output Arguments

collapse all

Configuration matrix for the `n`-by-`n` matrix `D`, returned as an `n`-by-`p` numeric matrix. The rows of `Y` correspond to the coordinates of `n` points in a `p`-dimensional space, where `p` < `n`.

When you do not specify `p`:

• If `D` is a Euclidean distance matrix, its elements are the pairwise distances between the `n` points, and `p` is the dimension of the smallest space in which these points can be embedded.

• If `D` is a non-Euclidean distance matrix or a dissimilarity matrix, `p` is the number of positive eigenvalues of `Y*Y'`. In this case, the reduction to `p` or fewer dimensions provides a reasonable approximation to `D` only if the negative eigenvalues of `Y*Y'` are small in magnitude.

When you specify `p`:

• If a `p`-dimensional embedding is possible, then `Y` has size `n`-by-`p`.

• If only a `q`-dimensional embedding with `q` < `p` is possible, then `Y` has size `n`-by-`q`.

Eigenvalues of `Y*Y'`, returned as a numeric column vector. If you specify `p`, then `e` has length `p`. Otherwise, `e` has length `n`. When `D` is Euclidean, the first `p` elements of `e` are positive, and the rest are zero. If the first `k` elements of `e` are much larger than the remaining `(n – k)` elements, you can use the first `k` columns of `Y` as `k`-dimensional points whose interpoint distances approximate `D`. This approach can provide a useful dimension reduction for visualization, which is analogous to Principal Component Analysis (PCA).

## References

[1] Cox, Trevor F., and Michael A. A. Cox. Multidimensional Scaling. 2nd ed. Monographs on Statistics and Applied Probability 88. Boca Raton: Chapman & Hall/CRC, 2001.

[2] Davison, Mark L. Multidimensional Scaling. Wiley Series in Probability and Mathematical Statistics. New York: Wiley, 1983.

[3] Seber, G. A. F. Multivariate Observations. 1st ed. Wiley Series in Probability and Statistics. Wiley, 1984.

## Version History

Introduced before R2006a