Multidimensional Scaling

Introduction to Multidimensional Scaling

One of the most important goals in visualizing data is to get a sense of how near or far points are from each other. Often, you can do this with a scatter plot. However, for some analyses, the data that you have might not be in the form of points at all, but rather in the form of pairwise similarities or dissimilarities between cases, observations, or subjects. There are no points to plot.

Even if your data are in the form of points rather than pairwise distances, a scatter plot of those data might not be useful. For some kinds of data, the relevant way to measure how near two points are might not be their Euclidean distance. While scatter plots of the raw data make it easy to compare Euclidean distances, they are not always useful when comparing other kinds of inter-point distances, city block distance for example, or even more general dissimilarities. Also, with a large number of variables, it is very difficult to visualize distances unless the data can be represented in a small number of dimensions. Some sort of dimension reduction is usually necessary.

Multidimensional scaling (MDS) is a set of methods that address all these problems. MDS allows you to visualize how near points are to each other for many kinds of distance or dissimilarity metrics and can produce a representation of your data in a small number of dimensions. MDS does not require raw data, but only a matrix of pairwise distances or dissimilarities.

Classical Multidimensional Scaling

Introduction to Classical Multidimensional Scaling

This example shows how to use cmdscale to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

cmdscale takes as an input a matrix of inter-point distances and creates a configuration of points. Ideally, those points are in two or three dimensions, and the Euclidean distances between them reproduce the original distance matrix. Thus, a scatter plot of the points created by cmdscale provides a visual representation of the original distances.

As a very simple example, you can reconstruct a set of points from only their inter-point distances. First, create some four dimensional points with a small component in their fourth coordinate, and reduce them to distances.

rng default;  % For reproducibility
X = [normrnd(0,1,10,3),normrnd(0,.1,10,1)];
D = pdist(X,'euclidean');

Next, use cmdscale to find a configuration with those inter-point distances. cmdscale accepts distances as either a square matrix, or, as in this example, in the vector upper-triangular form produced by pdist.

[Y,eigvals] = cmdscale(D);

cmdscale produces two outputs. The first output, Y, is a matrix containing the reconstructed points. The second output, eigvals, is a vector containing the sorted eigenvalues of what is often referred to as the "scalar product matrix," which, in the simplest case, is equal to Y*Y'. The relative magnitudes of those eigenvalues indicate the relative contribution of the corresponding columns of Y in reproducing the original distance matrix D with the reconstructed points.

format short g
[eigvals eigvals/max(abs(eigvals))]
ans =

        35.41            1
       11.158      0.31511
       1.6894      0.04771
       0.1436    0.0040553
   7.9529e-15    2.246e-16
    4.564e-15   1.2889e-16
   2.6538e-15   7.4944e-17
  -2.2475e-17  -6.3471e-19
  -3.6359e-16  -1.0268e-17
  -3.3335e-15  -9.4139e-17

If eigvals contains only positive and zero (within round-off error) eigenvalues, the columns of Y corresponding to the positive eigenvalues provide an exact reconstruction of D, in the sense that their inter-point Euclidean distances, computed using pdist, for example, are identical (within round-off) to the values in D.

maxerr4 = max(abs(D - pdist(Y)))   % Exact reconstruction
maxerr4 =


If two or three of the eigenvalues in eigvals are much larger than the rest, then the distance matrix based on the corresponding columns of Y nearly reproduces the original distance matrix D. In this sense, those columns form a lower-dimensional representation that adequately describes the data. However it is not always possible to find a good low-dimensional reconstruction.

maxerr3 = max(abs(D - pdist(Y(:,1:3))))  % Good reconstruction in 3D
maxerr2 = max(abs(D - pdist(Y(:,1:2))))  % Poor reconstruction in 2D
maxerr3 =


maxerr2 =


The reconstruction in three dimensions reproduces D very well, but the reconstruction in two dimensions has errors that are of the same order of magnitude as the largest values in D.

ans =


Often, eigvals contains some negative eigenvalues, indicating that the distances in D cannot be reproduced exactly. That is, there might not be any configuration of points whose inter-point Euclidean distances are given by D. If the largest negative eigenvalue is small in magnitude relative to the largest positive eigenvalues, then the configuration returned by cmdscale might still reproduce D well.

Example: Multidimensional Scaling

This example shows how to construct a map of 10 US cities based on the distances between those cities, using cmdscale.

First, create the distance matrix and pass it to cmdscale. In this example, D is a full distance matrix: it is square and symmetric, has positive entries off the diagonal, and has zeros on the diagonal.

cities = ...
D = [    0  587 1212  701 1936  604  748 2139 2182   543;
       587    0  920  940 1745 1188  713 1858 1737   597;
      1212  920    0  879  831 1726 1631  949 1021  1494;
       701  940  879    0 1374  968 1420 1645 1891  1220;
      1936 1745  831 1374    0 2339 2451  347  959  2300;
       604 1188 1726  968 2339    0 1092 2594 2734   923;
       748  713 1631 1420 2451 1092    0 2571 2408   205;
      2139 1858  949 1645  347 2594 2571    0  678  2442;
      2182 1737 1021 1891  959 2734 2408  678    0  2329;
       543  597 1494 1220 2300  923  205 2442 2329     0];
[Y,eigvals] = cmdscale(D);

Next, look at the eigenvalues returned by cmdscale. Some of these are negative, indicating that the original distances are not Euclidean. This is because of the curvature of the earth.

format short g
[eigvals eigvals/max(abs(eigvals))]
ans =

   9.5821e+06            1
   1.6868e+06      0.17604
       8157.3    0.0008513
       1432.9   0.00014954
       508.67   5.3085e-05
       25.143    2.624e-06
   3.3906e-10   3.5385e-17
       -897.7  -9.3685e-05
      -5467.6   -0.0005706
       -35479   -0.0037026

However, in this case, the two largest positive eigenvalues are much larger in magnitude than the remaining eigenvalues. So, despite the negative eigenvalues, the first two coordinates of Y are sufficient for a reasonable reproduction of D.

Dtriu = D(find(tril(ones(10),-1)))';
maxrelerr = max(abs(Dtriu-pdist(Y(:,1:2))))./max(Dtriu)
maxrelerr =


Here is a plot of the reconstructed city locations as a map. The orientation of the reconstruction is arbitrary.


Nonclassical Multidimensional Scaling

The function mdscale performs nonclassical multidimensional scaling. As with cmdcale, you use mdscale either to visualize dissimilarity data for which no "locations" exist, or to visualize high-dimensional data by reducing its dimensionality. Both functions take a matrix of dissimilarities as an input and produce a configuration of points. However, mdscale offers a choice of different criteria to construct the configuration, and allows missing data and weights.

For example, the cereal data include measurements on 10 variables describing breakfast cereals. You can use mdscale to visualize these data in two dimensions. First, load the data. For clarity, this example code selects a subset of 22 of the observations.

load cereal.mat
X = [Calories Protein Fat Sodium Fiber ... 
    Carbo Sugars Shelf Potass Vitamins];
% Take a subset from a single manufacturer
mfg1 = strcmp('G',cellstr(Mfg));
X = X(mfg1,:);
ans =
    22 10

Then use pdist to transform the 10-dimensional data into dissimilarities. The output from pdist is a symmetric dissimilarity matrix, stored as a vector containing only the (23*22/2) elements in its upper triangle.

dissimilarities = pdist(zscore(X),'cityblock');
ans =
     1   231

This example code first standardizes the cereal data, and then uses city block distance as a dissimilarity. The choice of transformation to dissimilarities is application-dependent, and the choice here is only for simplicity. In some applications, the original data are already in the form of dissimilarities.

Next, use mdscale to perform metric MDS. Unlike cmdscale, you must specify the desired number of dimensions, and the method to use to construct the output configuration. For this example, use two dimensions. The metric STRESS criterion is a common method for computing the output; for other choices, see the mdscale reference page in the online documentation. The second output from mdscale is the value of that criterion evaluated for the output configuration. It measures the how well the inter-point distances of the output configuration approximate the original input dissimilarities:

[Y,stress] =... 
stress =

A scatterplot of the output from mdscale represents the original 10-dimensional data in two dimensions, and you can use the gname function to label selected points:


Nonmetric Multidimensional Scaling

Metric multidimensional scaling creates a configuration of points whose inter-point distances approximate the given dissimilarities. This is sometimes too strict a requirement, and non-metric scaling is designed to relax it a bit. Instead of trying to approximate the dissimilarities themselves, non-metric scaling approximates a nonlinear, but monotonic, transformation of them. Because of the monotonicity, larger or smaller distances on a plot of the output will correspond to larger or smaller dissimilarities, respectively. However, the nonlinearity implies that mdscale only attempts to preserve the ordering of dissimilarities. Thus, there may be contractions or expansions of distances at different scales.

You use mdscale to perform nonmetric MDS in much the same way as for metric scaling. The nonmetric STRESS criterion is a common method for computing the output; for more choices, see the mdscale reference page in the online documentation. As with metric scaling, the second output from mdscale is the value of that criterion evaluated for the output configuration. For nonmetric scaling, however, it measures the how well the inter-point distances of the output configuration approximate the disparities. The disparities are returned in the third output. They are the transformed values of the original dissimilarities:

[Y,stress,disparities] = ... 
stress =

To check the fit of the output configuration to the dissimilarities, and to understand the disparities, it helps to make a Shepard plot:

distances = pdist(Y);
[dum,ord] = sortrows([disparities(:) dissimilarities(:)]);
plot(dissimilarities,distances,'bo', ...
     dissimilarities(ord),disparities(ord),'r.-', ...
     [0 25],[0 25],'k-')
legend({'Distances' 'Disparities' '1:1 Line'},...

This plot shows that mdscale has found a configuration of points in two dimensions whose inter-point distances approximates the disparities, which in turn are a nonlinear transformation of the original dissimilarities. The concave shape of the disparities as a function of the dissimilarities indicates that fit tends to contract small distances relative to the corresponding dissimilarities. This may be perfectly acceptable in practice.

mdscale uses an iterative algorithm to find the output configuration, and the results can often depend on the starting point. By default, mdscale uses cmdscale to construct an initial configuration, and this choice often leads to a globally best solution. However, it is possible for mdscale to stop at a configuration that is a local minimum of the criterion. Such cases can be diagnosed and often overcome by running mdscale multiple times with different starting points. You can do this using the 'start' and 'replicates' parameters. The following code runs five replicates of MDS, each starting at a different randomly-chosen initial configuration. The criterion value is printed out for each replication; mdscale returns the configuration with the best fit.

opts = statset('Display','final');
[Y,stress] =... 

35 iterations, Final stress criterion = 0.156209
31 iterations, Final stress criterion = 0.156209
48 iterations, Final stress criterion = 0.171209
33 iterations, Final stress criterion = 0.175341
32 iterations, Final stress criterion = 0.185881

Notice that mdscale finds several different local solutions, some of which do not have as low a stress value as the solution found with the cmdscale starting point.

Was this topic helpful?