compare groups of items regarding overlaps

6 views (last 30 days)
Short background: I have a number of texts that are being grouped regarding their value (about 5 differing values for each variable) for number of variables; meaning that each texts appears in one value group of each variable. (group A might be text1, text7, text23, text38; etc.)
Goal: I want to compare each of these primary groups regarding any overlap of their contained items using one group as a basis; i.e. I take group A and check which texts of this group also appear in any group of another variable (of course, I am not comparing groups that belong to the same variable, since there would oviously be no overlap). In the end, I'd like to say that e.g. Text 1, 7, 23 and 38 all appear in groups A, F, J, K and so forth.
That means I do not want to compare the means or any values of the data groups, but want to know which groups share which items.
Since I am not yet that experienced yet, I can't seem to find the right code to start with; any ideas about how to tackle this task?
  3 Comments
Image Analyst
Image Analyst on 23 Jun 2021
What do you mean by overlapping texts? What kind of data do you have? String arrays? Character arrays? Images? Tables? Cell arrays? Structure arrays? Can you attach your data (group(s)) in a .mat file with the paper clip icon.
save('answers.mat', 'group1', 'group2', 'group3');
Use your actual variable names of course.
In the meantime, see functions like setdiff(), intersect(), contains(), ismember(), strcmpi(), etc.
Ulrike Lohner
Ulrike Lohner on 24 Jun 2021
Unfortunately, I am not allowed to post any original data due to data security issues (and the code I have so far is importing the data, so that wouldn't be any help). I can try to be more specific regarding my data, though:
Basically I have a large number of groups of strings that are organized in a table (each column one group, each string in a cell); there are about 150 different strings in total and each string will appear in a number of groups; however, no group is composed of the same combination of strings, and additionally, the groups do not have the same sizes.
I will probably need a loop that takes each column (i.e. each group) as a starting point once, checking which strings of this group is also contained in the other groups; giving me as output a new set of string clusters that only contain those strings included in the first group.
Anyway: thank you for the suggestions so far; I will dig deeper into the functions you mentioned already and will check if one of them serves my purpose.

Sign in to comment.

Accepted Answer

SALAH ALRABEEI
SALAH ALRABEEI on 23 Jun 2021
Use
[val,ndxA,ndxB] = intersect(A,B)
It will give you the overlapping val and its index in both groups A and B
  1 Comment
Ulrike Lohner
Ulrike Lohner on 24 Jun 2021
Thank you for this suggestion! I will have a closer look at that function and check whether is serves the right prupose.

Sign in to comment.

More Answers (0)

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!