mahal
Mahalanobis distance to reference samples
Syntax
Description
returns the squared Mahalanobis
distance of each observation in d2
= mahal(Y
,X
)Y
to the reference
samples in X
.
Examples
Compare Mahalanobis and Squared Euclidean Distances
Generate a correlated bivariate sample data set.
rng('default') % For reproducibility X = mvnrnd([0;0],[1 .9;.9 1],1000);
Specify four observations that are equidistant from the mean of X
in Euclidean distance.
Y = [1 1;1 -1;-1 1;-1 -1];
Compute the Mahalanobis distance of each observation in Y
to the reference samples in X
.
d2_mahal = mahal(Y,X)
d2_mahal = 4×1
1.1095
20.3632
19.5939
1.0137
Compute the squared Euclidean distance of each observation in Y
from the mean of X
.
d2_Euclidean = sum((Y-mean(X)).^2,2)
d2_Euclidean = 4×1
2.0931
2.0399
1.9625
1.9094
Plot X
and Y
by using scatter
and use marker color to visualize the Mahalanobis distance of Y
to the reference samples in X
.
scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10 hold on scatter(Y(:,1),Y(:,2),100,d2_mahal,'o','filled') hb = colorbar; ylabel(hb,'Mahalanobis Distance') legend('X','Y','Location','best')
All observations in Y
([1,1]
, [-1,-1,]
, [1,-1]
, and [-1,1]
) are equidistant from the mean of X
in Euclidean distance. However, [1,1]
and [-1,-1]
are much closer to X than [1,-1]
and [-1,1]
in Mahalanobis distance. Because Mahalanobis distance considers the covariance of the data and the scales of the different variables, it is useful for detecting outliers.
Input Arguments
Y
— Data
n-by-m numeric matrix
Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation.
X
and Y
must have the same
number of columns, but can have different numbers of rows.
Data Types: single
| double
X
— Reference samples
p-by-m numeric matrix
Reference samples, specified as a p-by-m numeric matrix, where p is the number of samples and m is the number of variables in each sample.
X
and Y
must have the same
number of columns, but can have different numbers of rows.
X
must have more rows than columns.
Data Types: single
| double
Output Arguments
d2
— Squared Mahalanobis distance
n-by-1 numeric vector
Squared Mahalanobis distance of each observation in
Y
to the reference samples in
X
, returned as an n-by-1 numeric
vector, where n is the number of observations in
X
.
More About
Mahalanobis Distance
The Mahalanobis distance is a measure between a sample point and a distribution.
The Mahalanobis distance from a vector y to a distribution with mean μ and covariance Σ is
This distance represents how far y is from the mean in number of standard deviations.
mahal
returns the squared Mahalanobis distance d2 from an observation in Y
to the reference
samples in X
. In the mahal
function,
μ and Σ are the sample mean and covariance
of the reference samples, respectively.
Version History
Introduced before R2006a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)