# finding matches and unique data in two large data sets and output unique and matching data

3 views (last 30 days)
Mark Bodner on 10 Apr 2020
Answered: Guru Mohanty on 13 Apr 2020
Is there a function or way (without using a bunch of loops) I can find all of the matching entries in two sets of data and seperately all the unique entries in each data set and return three files with 1. All the matching data 2. All the unique data from data set 1, 3. All of the unique data from data set 2?
For example if I have (data set 1 left column, data set 2 right column) and want to search, match and sort by the first entry in each set (e.g. 48344) :
48344, 45, 75 48344, 45, 75
49726, 28, 55 49726, 28, 55
48731, 34, 34 49754, 28, 76
50071, 55, 35 50071, 55, 35
50320, 35, 65 50544, 55, 24
.
.
.
So that I could get as output three files
Output file Data set 1(all matches)
48344, 45, 75
49726, 28, 55
50071, 35, 65
Output file Data set 2 (unique data from set 1)
48731, 34, 34
59320, 35, 65
Output file data set 3 (unique data from set 2)
49754, 28, 76
50544, 55, 24
darova on 10 Apr 2020
Maybe ismember

Guru Mohanty on 13 Apr 2020
I understand you are trying to find matched and unique data between two different datasets. The function ismemberwill be appropriate. These steps would be helpful.
1. Define Two dataset.
2. Find index of matched and unique data by using ismemberfunction.
3. Extract appropriate data from dataset.
Here is a sample code using random values as input.
DataSetA = [
48344, 45, 75;...
49726, 28, 55;...
48731, 34, 34;...
50071, 55, 35;...
50320, 35, 65; ];
DataSetB = [
48344, 45, 75;...
49726, 28, 55;...
49754, 28, 76;...
50071, 55, 35;...
50544, 55, 24; ];
count=1:length(DataSetA);
for i=1:length(DataSetA)
if ismember(DataSetA(i,:),DataSetB,'rows')
idxA(i)=i;
end
if ismember(DataSetB(i,:),DataSetA,'rows')
idxB(i)=i;
end
end
idxMA=find(idxA);
idxMB=find(idxB);
idxNA=setdiff(count,idxMA);
idxNB=setdiff(count,idxMB);
% MAtched Data
MatchedData=DataSetA(idxMA,:)
% Unique Data A
UniqueDataA=DataSetA(idxNA,:)
% Unique Data B
UniqueDataB=DataSetB(idxNB,:)