How to remove duplicate groups of rows

2 views (last 30 days)
Leon
Leon on 30 Nov 2017
Edited: Leon on 30 Nov 2017
I have a matrix with the size of 20000 x 30. I'm trying to identify groups of rows that are similar (but not identical) to each other. What I did was to go through each of the row, and compare it with the entire 20000 rows of data. Now I have multiple groups of rows that meet my criteria. The things is that a lot of them will be duplicates. For example, if Row 1, 2, and 5, are in a group. I would have three groups of 1,2,5; 2,1,5; and 5,1,2.
Here is my question. How do I remove the duplicate rows so that in the end I only have one group 1,2,5 in the above example?
Many thanks!
  2 Comments
Andrei Bobrov
Andrei Bobrov on 30 Nov 2017
Place here a sample of your data
Leon
Leon on 30 Nov 2017
Where I work does not allow me to post any part of my data here, but they are composed of information, such as, Year Month Day Longitude Latitude, and other variables. I need to group together rows that are collected close to each other abs(longitude-longitude)<0.5, and abs(latitude-latitude)<0.5,etc.

Sign in to comment.

Answers (1)

Andrei Bobrov
Andrei Bobrov on 30 Nov 2017
Let A - your data as array with size (2e4 x 30) [ Year Month Day Longitude Latitude ... and etc.]
[~,ii] = uniquetol(A(:,4:5),.5,'ByRow',1,'DataScale',[1 1]);
out = A(ii,:);
  1 Comment
Leon
Leon on 30 Nov 2017
Edited: Leon on 30 Nov 2017
Many thanks for this amazing function!
The thing is that my interest is not to find the unique values among this matrix. Instead, my focus is to find the close-to-each-other rows as individual groups (to study their time-series change), and ignore the rest of the matrix.

Sign in to comment.

Categories

Find more on Cell Arrays in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!