MATLAB Answers

Get substring from text

3 views (last 30 days)
Hi,
I am new to matlab and I have a problem. I got a conversation between a group of people and I have a list of their names. I need to get a list of their names as they were mentioned in the conversation. The conversation and the names are cell arrays.
For example, the conversation is "Hi B, Hi A, Hey guys D can't come, Too Bad C, I Feel bad for D" and its a 1x5 cell array. The list of names is "A, B, C, D" and its a 1x4 cell array. I need the list "B, A, D, C, D" and its a 1x 5 cell array.
TIA

  0 Comments

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 6 Apr 2020
S = {'Hi B', 'Hi A', 'Hey guys D can''t come', 'Too Bad C', 'I Feel bad for D'};
names = {'A', 'B', 'C', 'D'};
namepattern = ['\<', strjoin(names,'\\>|\\<'), '\>'];
matched_names = regexp(S, namepattern, 'match');
The result will be a cell array the same size as S, with each entry containing a cell array of character vectors. The contained cell array will be empty if there are no name matches; it will contain a scalar cell array that contains a character vector if there is one match (the case you are expecting), and the contained cell array could contain multiple character vectors if multiple names were matched.
In the particular case where you are sure you only have at most one match, or you only want the first match if there are multiple matches, then you can use
matched_names = regexp(S, namepattern, 'match', 'once');
In that case, instead of matched_names being a cell array containing cell arrays, it would be a cell array directly containing character vectors
This code does not require that the names are single letters, and does not require that they are upper case. It will, however, only recognize them if they are isolated -- for example 'Too Bad C' it will detect that the 'B' in 'Bad' is followed by something that is word-like and so will not treat it as a match for the name 'B'. The code will consider digits and underscore to be word-like, so for example 'B_' and 'B3' would not be considered to match 'B' but 'B.' or 'B?' would be considered to match.

  1 Comment

Ido Gross
Ido Gross on 6 Apr 2020
that is great!
can you help me with a follow up?
im trying to make a graph with the result. the graph is who was mentioned after who.
s=matched_names(1:length(matched_names)-1);
t=matched_names(2:length(matched_names));
WUDG = graph(s ,t);
but i got an error
Error using matlab.internal.graph.constructFromEdgeList (line 107)
Node IDs must both be numeric or both be character vectors.
Error in graph (line 298)
matlab.internal.graph.constructFromEdgeList(...
Error in Untitled2 (line 21)
WUDG = graph(s ,t)
do you know why?

Sign in to comment.

More Answers (0)

Sign in to answer this question.