Main Content

mahal

Mahalanobis distance to reference samples

Description

example

d2 = mahal(Y,X) returns the squared Mahalanobis distance of each observation in Y to the reference samples in X.

Examples

collapse all

Generate a correlated bivariate sample data set.

rng('default') % For reproducibility
X = mvnrnd([0;0],[1 .9;.9 1],1000);

Specify four observations that are equidistant from the mean of X in Euclidean distance.

Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of each observation in Y to the reference samples in X.

d2_mahal = mahal(Y,X)
d2_mahal = 4×1

    1.1095
   20.3632
   19.5939
    1.0137

Compute the squared Euclidean distance of each observation in Y from the mean of X .

d2_Euclidean = sum((Y-mean(X)).^2,2)
d2_Euclidean = 4×1

    2.0931
    2.0399
    1.9625
    1.9094

Plot X and Y by using scatter and use marker color to visualize the Mahalanobis distance of Y to the reference samples in X.

scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10
hold on
scatter(Y(:,1),Y(:,2),100,d2_mahal,'o','filled')
hb = colorbar;
ylabel(hb,'Mahalanobis Distance')
legend('X','Y','Location','best')

All observations in Y ([1,1], [-1,-1,], [1,-1], and [-1,1]) are equidistant from the mean of X in Euclidean distance. However, [1,1] and [-1,-1] are much closer to X than [1,-1] and [-1,1] in Mahalanobis distance. Because Mahalanobis distance considers the covariance of the data and the scales of the different variables, it is useful for detecting outliers.

Input Arguments

collapse all

Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation.

X and Y must have the same number of columns, but can have different numbers of rows.

Data Types: single | double

Reference samples, specified as a p-by-m numeric matrix, where p is the number of samples and m is the number of variables in each sample.

X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

Data Types: single | double

Output Arguments

collapse all

Squared Mahalanobis distance of each observation in Y to the reference samples in X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

More About

collapse all

Mahalanobis Distance

The Mahalanobis distance is a measure between a sample point and a distribution.

The Mahalanobis distance from a vector y to a distribution with mean μ and covariance Σ is

d=(yμ)1(yμ)'.

This distance represents how far y is from the mean in number of standard deviations.

mahal returns the squared Mahalanobis distance d2 from an observation in Y to the reference samples in X. In the mahal function, μ and Σ are the sample mean and covariance of the reference samples, respectively.

Version History

Introduced before R2006a