need to vectorize efficiently calculating only certain values in the matrix multiplication A * B, using a logical array L the size of A * B
Show older comments
I have matrices A (m by v) and B (v by n). I also have a logical matrix L (m by n).
I am interested in, as efficiently as possible, calculating only the values in A * B that correspond to logical values in L (values of 1s). Essentially I am interested in the quantity ( A * B ) .* L .
For my problem, a typical L matrix has less than 0.1% percent of its values as 1s; the vast majority of the values are 0s. Thus, it makes no sense for me to directly perform ( A * B ) .* L , it would actually be faster to loop over each element of A * B that I want to compute, but even that is inefficient.
-------------------------------------------------------------------------------------------------------------------------------------------------------
Possible solution (need help vectorizing this code if possible)
My particular problem may have a nice solution given that the logical matrix L has a nice structure.
This L matrix is nice in that it can be represented as something like a permuted block matrix. This example L in is composed of 9 "blocks" of 1s, where each block of 1s has its own set of row and column indices. For instance, the highlighted area here can be seen the values of 1 as a particular submatrix in L.
My solution was to do this. I can get the row indices and column indices per each block's submatrix in L, organized in two cell lists "rowidxs_list" and "colidxs_list", both with the number of cells equal to the number of blocks. For instance in the block example I gave, subblock 1, I could calculate those particular values in A * B by simply doing A( rowidxs_list{1} , : ) * B( : , colidxs_list{1} ) .
That means that if I precomputed rowidxs_list and colidxs_list (ignore the costs of calculating these lists, they are negligable for my application), then my problem of calculating C = ( A * B ) .* L could effectively be done by:
C = sparse( m,n )
for i = 1:length( rowidxs_list )
C( rowidxs_list{i} , colidxs_list{i} ) = A( rowidxs_list{i} , : ) * B( : , colidxs_list{i} ) .
end
This seems like it would be the most efficient way to solve this problem if I knew how to vectorize this for loop. Does anyone see a way to vectorize this?
There may be ways to vectorize if certain things hold, e.g. only if rowidxs_list and colidxs_list are matrix arrays instead of cell lists of lists (where each column in an array is an index list, thus replacing use of rowidxs_list{i} with rowidxs_list(i,:) ). I'd prefer to use cell lists here if possible since different lists can have different numbers of elements.
-------------------------------------------------------------------------------------------------------------------------------------------------------
other suggested solution (creating a mex file?)
I first posted this question on the /r/matlab subreddit, see here for the reddit thread. The user "qtac" recommended that a C-MEX function linking to C programming language:
My gut feeling is the only way to really optimize this is with a C-MEX solution; otherwise, you are going to get obliterated by overhead from subsref in these loops. With C you could loop over L until you find a nonzero element, and then do only the row-column dot product needed to populate that specific element. You will miss out on a lot of the BLAS optimizations but the computational savings may make up for it.
Honestly I bet an LLM could write 90%+ of that MEX function for you; it's a well-formulated problem.
I think this could be a good solution to pursue, but I'd like other opinons as well.
8 Comments
James Tursa
on 17 Feb 2025
We would need to know typical sizes involved to offer any meaningful advice. Is there any pattern to the 1's in the L matrix?
Catalytic
on 18 Feb 2025
Thus, it makes no sense for me to directly perform ( A * B ) .* L , it would actually be faster to loop over each element of A * B
Have you actually tested to see if it is faster? Matlab extensively parallelizes operations like (A*B).*L, so it is not always clear if reducing the number of operations will translate into improved speed.
Cal
on 18 Feb 2025
For all cases I tried, I always found that ( A * B ) .* L is unfortunately very slightly slower than ( A * B ).
That's not what I asked. I asked if you had compared (A*B).* L to the mcoded loop that you claimed would be faster. Below is an example where the loop does poorer than the direct operation -
m=5000;n=m; v=3000;
A=rand(m,v); B=rand(v,m); L=sprand(m,n,0.1/100)>0;
timeit(@()direct(A,B,L))
timeit(@()looping(A,B,L))
function direct(A,B,L)
C=(A*B).*L;
end
function looping(A,B,L)
[I,J]=find(L);
K=numel(I);
C=double(L);
for k=1:K
C(I(k),J(k))=A(I(k),:)*B(:,J(k));
end
end
Cal
on 20 Feb 2025
Cal
on 20 Feb 2025
Accepted Answer
More Answers (1)
Is v much smaller than m or n? If so, one approach might be with an outer product decomposition:
L=sparse(L); %if L was not previously in sparse form
C=0;
for i=1:v
C = C + A(:,i).*L.*B(i,:);
end
Categories
Find more on Sparse Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!