Main Content

mahal

Mahalanobis distance to reference samples

Description

d2 = mahal(Y,X) returns the squared Mahalanobis distance of each observation in Y to the reference samples in X.

example

Examples

collapse all

Generate a correlated bivariate sample data set.

rng('default') % For reproducibility
X = mvnrnd([0;0],[1 .9;.9 1],1000);

Specify four observations that are equidistant from the mean of X in Euclidean distance.

Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of each observation in Y to the reference samples in X.

d2_mahal = mahal(Y,X)
d2_mahal = 4×1

    1.1095
   20.3632
   19.5939
    1.0137

Compute the squared Euclidean distance of each observation in Y from the mean of X.

d2_Euclidean = sum((Y-mean(X)).^2,2)
d2_Euclidean = 4×1

    2.0931
    2.0399
    1.9625
    1.9094

Plot X and Y by using scatter and use marker color to visualize the Mahalanobis distance of Y to the reference samples in X.

scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10
hold on
scatter(Y(:,1),Y(:,2),100,d2_mahal,'o','filled')
hb = colorbar;
ylabel(hb,'Mahalanobis Distance')
legend('X','Y','Location','best')

Figure contains an axes object. The axes object contains 2 objects of type scatter. These objects represent X, Y.

All observations in Y ([1,1], [-1,-1,], [1,-1], and [-1,1]) are equidistant from the mean of X in Euclidean distance. However, [1,1] and [-1,-1] are much closer to X than [1,-1] and [-1,1] in Mahalanobis distance. Because Mahalanobis distance considers the covariance of the data and the scales of the different variables, it is useful for detecting outliers.

Input Arguments

collapse all

Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation.

X and Y must have the same number of columns, but can have different numbers of rows.

Data Types: single | double

Reference samples, specified as a p-by-m numeric matrix, where p is the number of samples and m is the number of variables in each sample.

X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

Data Types: single | double

Output Arguments

collapse all

Squared Mahalanobis distance of each observation in Y to the reference samples in X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

More About

collapse all

Tips

  • Each time you call the mahal function, it computes the covariance matrix of the reference samples. In cases where you want to compute Mahalanobis distances between multiple sets of data and the same reference samples X, you can save computing time by calculating the covariance matrix of X only once, and supplying it to the pdist2 function. For an example, see Compute Mahalanobis Distance.

Version History

Introduced before R2006a