"symmetrical" rows of matrix

I have integer matrix A (nA x c) with even number of columns (e.g. mod(c,2) = 0) and unique rows.
How to effectivelly (by speed and memory optimized function "symmetricRows") find the "symmetric" rows of matrix A iA1 and iA2, where "symmetric" rows iA1 and iA2 are defined as:
all(A(iA1,1:end/2) == A(iA2,end/2+1:end) & A(iA1,end/2+1:end) == A(iA2,1:end/2),2) = true
Example:
A = [1 1 1 1;
2 2 2 2;
1 2 3 4;
4 3 2 1;
2 2 3 3;
3 4 1 2;
3 3 2 2]
[iA1, iA2] = symmetricRows(A)
iA1 =
1
2
3
5
iA2 =
1
2
6
7
Typical size of matrices A: nA ~ 1e4-1e6, c ~ 60 - 120
The problem is motivated by pre-processing of large dataset, where "symmetrical" rows are irrelevant from the point of user defined distance metric.

 Accepted Answer

Michal
Michal on 11 Feb 2020
Edited: Michal on 11 Feb 2020
I present the best solution so far:
d = ~pdist2(A(:,1:end/2), A(:,end/2+1:end));
[iA1, iA2] = find(triu(d & d.'));

4 Comments

I tried an approach similar to the one you posted, but did some testing and realized that this approach will not work for arrays of size about A ~ [4e4,120]. You hit the (default) maximum array size limit. Since you said you need a solution for an array 50 times larger than that, I knew it was a problem.
Also, my preliminary estimate of how long it would take to run this theoretically was about 9 hours. I'm not sure about that, and was thinking about it some more.
Maybe the reason you are not getting "relevant" answers is simply because you have posted a very challenging problem.
On my PC (with 64GB RAM) in a case when matrix A is in class "single" I am able to process A ~ [1e5,1e2] matrix in reasonable time (cca 30 seconds).
I think that only useful solution will be based on any kind of process in the chunks, but I have no idea what type of in-loop processing will be best in my case.
But yes, you are right, the problem is very challenging...
Yeah, I should have mentioned that I did my testing on MATLAB Online, so it's probably not the most powerful platform. :-)
Yes, defintely, MATLAB Online is not proper way how to compute any memory or CPU intensive task at all ... :)

Sign in to comment.

More Answers (0)

Products

Release

R2019b

Asked:

on 10 Feb 2020

Commented:

on 11 Feb 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!