canoncorr
Canonical correlation
Syntax
Description
Examples
Compute Sample Canonical Correlation
Perform canonical correlation analysis for a sample data set.
The data set carbig
contains measurements for 406 cars from the years 1970 to 1982.
Load the sample data.
load carbig;
data = [Displacement Horsepower Weight Acceleration MPG];
Define X as the matrix of displacement, horsepower, and weight observations, and Y
as the matrix of acceleration and MPG observations. Omit rows with insufficient data.
nans = sum(isnan(data),2) > 0; X = data(~nans,1:3); Y = data(~nans,4:5);
Compute the sample canonical correlation.
[A,B,r,U,V] = canoncorr(X,Y);
View the output of A
to determine the linear combinations of displacement, horsepower, and weight that make up the canonical variables of X
.
A
A = 3×2
0.0025 0.0048
0.0202 0.0409
-0.0000 -0.0027
A(3,1)
is displayed as —0.000
because it is very small. Display A(3,1)
separately.
A(3,1)
ans = -2.4737e-05
The first canonical variable of X
is u1 = 0.0025*Disp + 0.0202*HP — 0.000025*Wgt
.
The second canonical variable of X
is u2 = 0.0048*Disp + 0.0409*HP — 0.0027*Wgt
.
View the output of B to determine the linear combinations of acceleration and MPG that make up the canonical variables of Y
.
B
B = 2×2
-0.1666 -0.3637
-0.0916 0.1078
The first canonical variable of Y
is v1 =
—
0.1666*Accel — 0.0916*MPG
.
The second canonical variable of Y
is v2 = —0.3637*Accel + 0.1078*MPG
.
Plot the scores of the canonical variables of X
and Y
against each other.
t = tiledlayout(2,2); title(t,'Canonical Scores of X vs Canonical Scores of Y') xlabel(t,'Canonical Variables of X') ylabel(t,'Canonical Variables of Y') t.TileSpacing = 'compact'; nexttile plot(U(:,1),V(:,1),'.') xlabel('u1') ylabel('v1') nexttile plot(U(:,2),V(:,1),'.') xlabel('u2') ylabel('v1') nexttile plot(U(:,1),V(:,2),'.') xlabel('u1') ylabel('v2') nexttile plot(U(:,2),V(:,2),'.') xlabel('u2') ylabel('v2')
The pairs of canonical variables are ordered from the strongest to weakest correlation, with all other pairs independent.
Return the correlation coefficient of the variables u1
and v1
.
r(1)
ans = 0.8782
Input Arguments
X
— Input matrix
matrix
Input matrix, specified as an
n-by-d1 matrix. The
rows of X
correspond to observations, and the columns correspond to
variables.
Data Types: single
| double
Y
— Input matrix
matrix
Input matrix, specified as an
n-by-d2 matrix where
X
is an
n-by-d1 matrix. The
rows of Y
correspond to observations, and the columns correspond to
variables.
Data Types: single
| double
Output Arguments
A
— Sample canonical coefficients for X variables
matrix
Sample canonical coefficients for the variables in X
, returned
as a d1-by-d matrix, where d =
min(rank(X),rank(Y)).
The jth column of A
contains the linear
combination of variables that makes up the jth canonical variable for
X
.
If X
is less than full rank, canoncorr
gives a warning and returns zeros in the rows of A
corresponding to
dependent columns of X
.
B
— Sample canonical coefficients for Y variables
matrix
Sample canonical coefficients for the variables in Y
, returned
as a d2-by-d matrix, where d =
min(rank(X),rank(Y)).
The jth column of B
contains the linear
combination of variables that makes up the jth canonical variable for
Y
.
If Y
is less than full rank, canoncorr
gives a warning and returns zeros in the rows of B
corresponding to
dependent columns of Y
.
U
— Canonical scores for the X variables
matrix
Canonical scores for the variables in X
, returned as an
n-by-d matrix, where X
is
an n-by-d1 matrix and d =
min(rank(X),rank(Y)).
V
— Canonical scores for the Y variables
matrix
Canonical scores for the variables in Y
, returned as an
n-by-d matrix, where Y
is
an n-by-d2 matrix and d =
min(rank(X),rank(Y)).
stats
— Hypothesis test information
structure
Hypothesis test information, returned as a structure. This information relates to the sequence of d null hypotheses that the (k+1)st through dth correlations are all zero for k=1,…,d-1, and d = min(rank(X),rank(Y)).
The fields of stats
are
1-by-d vectors with elements corresponding to
the values of k.
Field | Description |
---|---|
Wilks | Wilks' lambda (likelihood ratio) statistic |
df1 | Degrees of freedom for the chi-squared statistic, and the numerator degrees of freedom for the F statistic |
df2 | Denominator degrees of freedom for the F statistic |
F | Rao's approximate F statistic for |
pF | Right-tail significance level for |
chisq | Bartlett's approximate chi-squared statistic for with Lawley's modification |
pChisq | Right-tail significance level for
|
stats
has two other fields (dfe
and
p
), which are equal to df1
and
pChisq
, respectively, and exist for historical reasons.
Data Types: struct
More About
Canonical Correlation Analysis
The canonical scores of the data matrices X and Y are defined as
where ai and bi maximize the Pearson correlation coefficient ρ(Ui,Vi) subject to being uncorrelated to all previous canonical scores and scaled so that Ui and Vi have zero mean and unit variance.
The canonical coefficients of X and Y are the matrices A and B with columns ai and bi, respectively.
The canonical variables of X and Y are the linear combinations of the columns of X and Y given by the canonical coefficients in A and B respectively.
The canonical correlations are the values ρ(Ui,Vi) measuring the correlation of each pair of canonical variables of X and Y.
Algorithms
canoncorr
computes A
, B
,
and r
using qr
and svd
. canoncorr
computes U
and
V
as U = (X—mean(X))*A
and V =
(Y—mean(Y))*B
.
References
[1] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988.
[2] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984.
Version History
Introduced before R2006a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)