# Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

14 views (last 30 days)

Show older comments

Given a matrix X and its vector form I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.

So in MATLAB code it would be something like that:

clear();

numRows = 3;

numCols = numRows;

mX = randn(numRows, numCols);

vX = mX(:);

% Lower Triangle are indices 2, 3, 6

mL = [ 0, 1, 0, 0, 0, 0, 0, 0, 0 ; ...

0, 0, 1, 0, 0, 0, 0, 0, 0 ; ...

0, 0, 0, 0, 0, 1, 0, 0, 0 ];

% Upper Triangle are indices 4, 7, 8

mU = [ 0, 0, 0, 1, 0, 0, 0, 0, 0 ; ...

0, 0, 0, 0, 0, 0, 1, 0, 0 ; ...

0, 0, 0, 0, 0, 0, 0, 1, 0 ];

assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));

assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));

I am after sparse represenation of mU and mL in the most efficient way.

My current implementation is given by:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )

EXTRACT_LOWER_TRIANGLE = 1;

EXTRACT_UPPER_TRIANGLE = 2;

INCLUDE_DIAGONAL = 1;

EXCLUDE_DIAGONAL = 2;

switch(diagFlag)

case(INCLUDE_DIAGONAL)

numElements = 0.5 * numRows * (numRows + 1);

diagIdx = 0;

case(EXCLUDE_DIAGONAL)

numElements = 0.5 * (numRows - 1) * numRows;

diagIdx = 1;

end

vJ = zeros(numElements, 1);

if(triangleFlag == EXTRACT_LOWER_TRIANGLE)

elmntIdx = 0;

for jj = 1:numRows

for ii = (jj + diagIdx):numRows

elmntIdx = elmntIdx + 1;

vJ(elmntIdx) = ((jj - 1) * numRows) + ii;

end

end

elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)

elmntIdx = numElements + 1;

for jj = numRows:-1:1

for ii = (jj - diagIdx):-1:1

elmntIdx = elmntIdx - 1;

vJ(elmntIdx) = ((jj - 1) * numRows) + ii;

end

end

end

mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);

end

Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?

Thank You.

##### 24 Comments

Matt J
on 23 Apr 2020

### Accepted Answer

James Tursa
on 22 Apr 2020

Edited: James Tursa
on 22 Apr 2020

Here is a mex routine that generates the sparse double matrices mL and mU directly, so no wasted memory in creating them. Seems to run about 3x-5x faster than m-code for somewhat large sizes.

/* S = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag)

*

* S = double sparse matrix

* numRows = integer > 0

* triangleFlag = 1 , extract lower triangle

* 2 , extract upper triangle

* diagFlag = 1 , include diagonal

* 2 , exclude diagonal

* where

*

* M = an numRows X numRows matrix of non-zero terms

* assert(isequal(S * M(:), mX(logical(tril(M, -1))))); % for lower

* assert(isequal(S * M(:), mX(logical(triu(M, 1))))); % for upper

*

* Programmer: James Tursa

* Date: 2020-April-22

*/

#include "mex.h"

void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

mwSize numRows, triangleFlag, diagFlag, numElements;

mwIndex *Ir, *Jc;

mwIndex i, j, k, m;

double *pr;

if( nrhs != 3 || !mxIsNumeric(prhs[0]) || !mxIsNumeric(prhs[1]) || !mxIsNumeric(prhs[2]) ||

mxGetNumberOfElements(prhs[0]) != 1 || mxGetNumberOfElements(prhs[1]) != 1 ||

mxGetNumberOfElements(prhs[2]) != 1 ) {

mexErrMsgTxt("Need three numeric scalar inputs");

}

if( nlhs > 1 ) {

mexErrMsgTxt("Too many outputs");

}

numRows = mxGetScalar(prhs[0]);

triangleFlag = mxGetScalar(prhs[1]);

diagFlag = mxGetScalar(prhs[2]);

if( numRows < 1 ) {

mexErrMsgTxt("Invalid numRows, should be > 0");

}

if( triangleFlag != 1 && triangleFlag != 2 ) {

mexErrMsgTxt("Invalid triangleFlag, should be 1 or 2");

}

if( diagFlag != 1 && diagFlag != 2 ) {

mexErrMsgTxt("Invalid diagFlag, should be 1 or 2");

}

if( diagFlag == 1 ) {

numElements = numRows * (numRows + 1) / 2; /* include diagonal */

} else {

numElements = (numRows - 1) * numRows / 2; /* exclude diagonal */

}

plhs[0] = mxCreateSparse(numElements, numRows*numRows, numElements, mxREAL);

pr = (double *) mxGetData(plhs[0]);

Ir = mxGetIr(plhs[0]);

Jc = mxGetJc(plhs[0]);

Jc[0] = 0;

diagFlag--;

k = 0;

m = 1;

if( triangleFlag == 1 ) { /* Lower */

for( j=0; j<numRows; j++ ) {

for( i=0; i<numRows; i++ ) {

if( i >= j+diagFlag ) {

*pr++ = 1.0;

*Ir++ = k++;

Jc[m] = Jc[m-1] + 1;

} else {

Jc[m] = Jc[m-1];

}

m++;

}

}

} else { /* Upper */

for( j=0; j<numRows; j++ ) {

for( i=0; i<numRows; i++ ) {

if( i+diagFlag <= j ) {

*pr++ = 1.0;

*Ir++ = k++;

Jc[m] = Jc[m-1] + 1;

} else {

Jc[m] = Jc[m-1];

}

m++;

}

}

}

}

You mex the routine as follows (you need a supported C compiler installed):

mex GenerateTriangleExtractorMatrixMex.c

And some test code:

% GenerateTriangleExtractorMatrix_test.m

n = 300;

disp('m-code timing')

tic

GenerateTriangleExtractorMatrix(10000,1,1);

toc

disp('mex code timing')

tic

GenerateTriangleExtractorMatrixMex(10000,1,1);

toc

for k=1:n

numRows = ceil(rand*5000+100);

numCols = numRows;

triangleFlag = (rand<0.5) + 1;

diagFlag = (rand<0.5) + 1;

Mm = GenerateTriangleExtractorMatrix(numRows,triangleFlag,diagFlag);

Mx = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag);

if( ~isequal(Mm,Mx) )

error('Not equal');

end

end

disp('Random tests passed')

With a sample run:

>> GenerateTriangleExtractorMatrix_test

m-code timing

Elapsed time is 9.964882 seconds.

mex code timing

Elapsed time is 1.901741 seconds.

Random tests passed

##### 4 Comments

### More Answers (2)

Matt J
on 23 Apr 2020

Edited: Matt J
on 23 Apr 2020

Another approach to consider is to use my MatrixObj class

to construct an object that has the same effect as the operations mL*X and mL.'*Y, but doesn't require you to actually build the matrix,

N=5000;

tic;

mL0=GenerateTriangleExtractorMatrix( N, 1, 2);

toc

%Elapsed time is 0.678702 seconds.

tic;

B=tril(true(N),-1);

Bd=double(B(:));

mL=MatrixObj;

mL.Params.B=B;

mL.Params.Bd=Bd;

mL.Ops.mtimes=@(obj,z) z(obj.Params.B);

mL.Trans.mtimes=@mtimesT;

toc;

%Elapsed time is 0.086228 seconds.

function out=mtimesT(obj,z)

out=obj.Params.Bd;

out(obj.Params.B)=z;

end

In addition to requiring less time to construct, you can verify that it gives the same results as multiplications with mL and mL.',

>> X=rand(N^2,1); isequal(mL0.'*(mL0*X),mL.'*(mL*X))

ans =

logical

1

but with considerably less memory consumption:

>> whos mL mL0

Name Size Kilobytes Class Attributes

mL 1x1 219739 MatrixObj

mL0 12497500x25000000 390586 double sparse

Royi Avital
on 21 Apr 2020

##### 2 Comments

Tommy
on 21 Apr 2020

The two methods are fairly similar - I also like that yours minimizes memory allocation. I ran a few simple fun tests:

I didn't dare try higher than 20,000 for numRows. It seems that your code may possibly perform better at higher values of numRows. In the second case (calculating both the upper and lower triangles) I had your code running both sets of for loops, one after the other (shown in red). In green is the result from your code if only the first set of loops runs, and you recognize that vJ for one triangle is easy to determine if you have vJ for the other triangle (N^2+1-flip(vJ)). So the only thing I'll conclude from this is, if you will eventually calculate both the lower and upper triangle matrices for a given size, it might be better to calculate them together and only find vJ once. I suppose it depends on how expensive N^2+1-flip(vJ) is.

### See Also

### Categories

### Products

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!