# Algorithm to extract linearly dependent columns in a matrix

27 views (last 30 days)
HN on 3 Aug 2020
Edited: Bruno Luong on 5 Aug 2020
Is there any general or standard approach to extract columns that are linearly dependent from the given matrix ?
Thanks and any help is apperciated !

John D'Errico on 3 Aug 2020
It is difficult to answer this. Consider a simple matrix:
format short g
A = rand(4,5)
A =
0.73796 0.36725 0.042904 0.0056783 0.45642
0.13478 0.54974 0.60439 0.39645 0.36555
0.4903 0.25921 0.63407 0.77345 0.95769
0.43373 0.27658 0.6462 0.25778 0.23421
The matrix (since it is random) will be of full rank, thus 4 in this case.
EVERY column is linearly dependent. That is, We can write every column as a linear combination of the other 4 columns. I can argue problems exist with other matrices too.
A = magic(6)
A =
35 1 6 26 19 24
3 32 7 21 23 25
31 9 2 22 27 20
8 28 33 17 10 15
30 5 34 12 14 16
4 36 29 13 18 11
rank(A)
ans =
5
So A here has rank 5. But again, we can write any column of A as a linear combination of the other 5. So which column is linearly dependent? They all are!
Perhaps you might get something out of the null space vector(s).
Anull = null(A);
Anull = Anull/Anull(1)
Anull =
1
1
-0.5
-1
-1
0.5
This gives us the linear combination of importance as:
A(:,1) + A(:,2) - 0.5*A(:,3) - A(:,4) - A(:,5) + 0.5*A(:,6) = 0
We can now solve for ANY of those columns, in terms of the others.
How it helps you, I don't really know, because I have no idea what you really want to do.
If I had to guess, what you really need is to learn enough about linear algebra, and perhaps what a pivoted QR decomposition might provide. Because once you have that pivoted QR, you also have enough to do almost anything you want to do.
[Q,R,E] = qr(A,0)
Q =
-0.017647 0.64381 -0.20699 0.22818 0.49018 0.5
-0.56472 -0.11589 -0.45002 0.59421 -0.33476 -3.7638e-16
-0.15883 0.52674 -0.446 -0.47355 -0.15542 -0.5
-0.49413 -0.0017098 0.37629 0.14508 0.58583 -0.5
-0.088237 0.52963 0.62872 0.19923 -0.52605 -4.2484e-16
-0.63531 -0.11878 0.13728 -0.55665 -0.059779 0.5
R =
-56.666 -16.377 -42.107 -33.53 -33.53 -35.224
0 53.915 18.611 30.231 30.676 29.048
0 0 32.491 -7.9245 -8.9182 -11.289
0 0 0 10.101 5.6138 -0.56312
0 0 0 0 5.1649 -5.1649
0 0 0 0 0 -2.0136e-15
E =
2 1 3 6 4 5
We can use the QR to tell us much about the problem, as well as efficiently and stably provide all you need.
First, notice that Q(6,6) is essentially zero, compared to the other diagonal elements.
As well, the QR tells us that it decided column 5 (that is the last element of E) is the one it wants to drop out.

John D'Errico on 5 Aug 2020
LAPACK was originally written in Fortran as I recall, but it was quickly put into c++.
You can find it on netlib.
Personally, my favorite source for these basic algorithms was the LINPACK User's Guide. This was a well written and highly readable resource when I was just learning numerical linear algebra.
HN on 5 Aug 2020
Thank you John ! I will read it !
Bruno Luong on 5 Aug 2020
"May I know the reason why R is used used both in the input and output of the function " MultipleQR()" ? "
This is an implement detail I chose for mainly two reasons.
• It saves memory by not duplicate the matrix
• The QR factorization can be rewritten as Q'*A*P = R
The later identity meaning I search sucessive the basis orthonormal basis Q and row-permutation A*P of A such that the projection of the second on the basis is new matrix is upper diagonal (R). The the implementation just assume A is R before iteration (thus non upper triangular) and update R while the iterations until it reaches the triangular form, using Householder transformation at each iteration to make elements bellow the diagonal become 0s.

Bruno Luong on 3 Aug 2020
Edited: Bruno Luong on 3 Aug 2020
Test matrix (10 x 6) with rank 4
M = rand(10,4)*rand(4,6)
Automatic selection of independent columns of M
[Q,R,p] = qr(M,'vector');
dr = abs(diag(R));
if dr(1)
tol = 1e-10;
r = find(dr>=tol*dr(1),1,'last');
ci = p(1:r) % here is the index of independent columns
else
r = 0;
ci = [];
end
% Submatrix with r columns (and full column rank).
Mind=M(:,ci)
% Those three rank estimation should be equals
% if it's not then the cause if MATLAB selection of tolerance for rank differs with the above
% and usage of more robust SVD algorithm for rank estimation
rank(Mind)
rank(M)
r

Bruno Luong on 5 Aug 2020
Matt: "but it seems to rely on the assumption that the rows of the matrix abs(R) from the QR decomposition are maximized at the diagonal."
Yes this statement is true
abs(R(i,i)) >= abs(R(i,j)) for all j>=i
But this is not only an assumption, it is a consequence of the pivoting selection of QR algo with permutaion/.
Proof
At iteration #i, the algorithm selects the remaining (n-i+1) columns of A that maximizes the projection on orthogonal of the subspace V_{i-1} (notation used in my explanation below John's answer), which is abs(R(i,i)).
For simplification let denote W_{k} := orthogonal of the subspace V_{k}.
So later on at the iteration step j > i, abs(R(i,j)) is the norm of the projection of A(:,p(j)) on span(Q(:,i)).
As span(Q(:,i)) is included in W_{i-1} by construction, thus
norm(projection x on span(Q(:,i))) <= norm(projection x on W_{i-1}) for any vector x in R^n.
Apply this for x = A(:,p(j)), we get
abs(R(i,j)) <= norm(projection A(:,p(j)), on W_{i-1}) <= norm(projection A(:,p(i)), on W_{i-1}) = abs(R(i,i))
This is the proof of why diag(R) dominates off-diagonal terms row-wise, therefore you can remove the statement as "assumption".
Bruno Luong on 5 Aug 2020
By similar proof we can show that QR factorization with pivotting satisfied
abs(R(j,j)) <= abs(R(i,i)) for j>=i
In other word abs(diag(R)) is decending sorted.
It can easily see that
the absolute value of determinant of A(:,p(1)) projected on span{Q(:,1)} is abs(R(1,1))
the absolute value of determinant of A(:,p(1:2)) projected on span{Q(:,1:2)} is abs(R(1,1)*R(2,2))
...
the absolute value of determinant of A(:,p(1:n)) projected on span{Q(:,1:n)} is abs(R(1,1)*R(2,2)*...*R(n,n)
We also know when determinant ~= 0, column vectors are independant.
All these observations lead to this fact:
A(:,p(1:r)) are independent with r is the last index such that abs(R(r,r)) > 0.
r being the rank of A, since the product after n will result 0.
QED
Matt J on 5 Aug 2020
Fantastic!!!!

Matt J on 4 Aug 2020

#### 1 Comment

HN on 4 Aug 2020
Thank you Matt J
I just found it a bit earlier before you post it here and it clears everything for me !
Thank you