I am currently working with large data sets, on the range of 500k-1m rows of data in any given matrix. (nx3) I want to know how to sift through the rows of the matrix to see if any of the rows have the same values in them. ex. [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7] I want to remove the second [1 2 3] row, such that [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7] Can anyone help me with this?

The *|unique|* function can help here: A = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7]; [Au,ia,ic] = unique(A, 'rows', 'stable'); RowIdxFreq = accumarray(ic, 1); RowIdxFreq = 2 1 1 1 1 The *|‘RowIdxFreq’|* variable has the frequencies of the occurrences of the rows. Here, row #1 is repeated.

Find similar values in a matrix

lsutiger1 on 2 Feb 2016

Star Strider,

I did see that command in the documentation, and it is helpful. What I don't get from that is 1) where the second instance occurs and 2) a way to delete that row from the matrix without leaving 0's in it's place.

lsutiger1 on 2 Feb 2016

Would just setting a value,

X = unique(A, 'rows', 'stable')

return the matrix without those rows?

Stephen23 on 2 Feb 2016

Yes. Try it and see.

Star Strider on 2 Feb 2016

Open in MATLAB Online

To find and delete what rows are repeated, this works:

A = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7; 3 4 5];
[Au,ia,ic] = unique(A, 'rows', 'stable');
RowIdxFreq = accumarray(ic, 1)
Repeats = find(RowIdxFreq > 1);
RowsToDelete = [];
for k1 = 1:length(Repeats)
    RepeatedRows{k1} = find(ic == Repeats(k1));
    RowsToDelete = [RowsToDelete; RepeatedRows{k1}(2:end)];
end
A(RowsToDelete,:) = [];                                         % ‘A’ With Repeated Rows Deleted

The ‘Repeats’ assignment finds the first row that has repeats elsewhere in the matrix, and the ‘Repeated Rows’ is a cell array that contains the rows that are duplicated. The ‘RowsToDelete’ keeps track of all of them, then the ‘A’ assignment after the loop uses it to delete all of them at once.

It is not necessary to keep the ‘RepeatedRows’ data in an array. I did here because I wanted to be certain it was doing what I wanted it to.

lsutiger1 on 2 Feb 2016

Edited: lsutiger1 on 2 Feb 2016

Open in MATLAB Online

Using the unique function is now not working. I have a cell array, which is composed of a string of letters and then coordinates, ex [C 1 1 1], which I created by

X = [atom_names num2cell(atomPosition_flat)];

This gives me a cell array (nx4).

I try to use unique to find where the repeated rows are,

atomPositions = unique(X,'rows','stable');

But get this error: Input A must be a cell array of strings.

Using num2str on the atomPosition_flat matrix (nx3) turns it into an nx33 char.

Star Strider on 2 Feb 2016

Open in MATLAB Online

Without having your matrix to experiment with, I can only guess.

See if adding a cell reference (the ‘{}’ brackets) works:

atomPositions = unique(X{:},'rows','stable');

If you have a relatively ‘uncomplicated’ cell array, that should work. If unique still has problems, you might have to use sprintf to convert the numbers to strings before you do the operations in my code. (I assume ‘atom_names’ are already strings.)

lsutiger1 on 2 Feb 2016

Edited: lsutiger1 on 2 Feb 2016

The matrix is a 2520x3 matrix, and yes, atom_names is a 2520x1 vector of strings. I tried converting it to strings using num2str, which did not work, because then I got a "dimension mismatch" error when num2str converted my matrix into a 2520x33 char. Will try to use sprintf.

Find similar values in a matrix

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

7 Comments
Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Find similar values in a matrix

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

7 Comments Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

7 Comments
Show 5 older comments Hide 5 older comments