Regular Expression to match strings after a certain number of words that do not contain a keyword

Question

Matthew on 29 Dec 2017

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/374953-regular-expression-to-match-strings-after-a-certain-number-of-words-that-do-not-contain-a-keyword

Answered: Matthew on 2 Jan 2018

I'm attempting to use regular expressions to retrieve the middle of a string, and in the default case I need to match after two words, and in the non-default case I need to match after two words which do not contain a keyword.

An example input character array is

defaultCase = '1.2.3.4 Hello\ - my name is Bob'

This is fairly easy to handle - the below regex looks for two expressions which contain alpha_numeric characters, and then matches everything that follows the next alpha_numeric character.

%Returns 'my name is Bob'
matchedString = regexpi(defaultCase,'(?<=(\S*\w\S*\s[\s\W]*){2})\w.*','match','once')

Harder Case

nonDefault1 = '1.2.3.4 Hello Matlabbers - my name is Bob'
nonDefault2 = '1.2.3.4 Matlabbers - Hello - my name is Bob'

In this case I would like to explicitly not count the word Matlabbers in my look behind match - and I'd still like the output to be my name is Bob.

The best I've come up with is something like the following

%Returns 'my name is Bob'
regexpi(nonDefault2,'(?<=[\d\.]+\s+(Matlabbers)?\W*(?!Matlabbers)\S*\w\s\W+)\w[^\(]*\w','match','once')

This works for the nonDefault2 case, but in general it doesn't work. Does anyone know of a robust way to do this?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Matthew on 2 Jan 2018

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/374953-regular-expression-to-match-strings-after-a-certain-number-of-words-that-do-not-contain-a-keyword#answer_298454

Open in MATLAB Online

I ended up just building this in a very piece wise way.

%Demonstration Cases
defaultCase = '1.2.3.4 Hello\ - my name is Bob';
nonDefault1 = '1.2.3.4 Hello Matlabbers - my name is Bob';
nonDefault2 = '1.2.3.4 Matlabbers - Hello - my name is Bob';
%Word to skip during counting
skipBasic = 'Matlabbers';
%Set up the regular expression
word = '(\S*[a-zA-Z0-9]+\S*)';
space = '(\s[\W\s_]*)';
skipWord = ['(\S*' skipBasic '\S*)'];
skipWordSpace = ['(',skipWord space '?)'];
wordSpace = ['(',word space '?)'];
nonSkipWord = ['(\<(?!' skipWord ')' word '\>)'];
pairedWord = ['(' skipWordSpace '*' nonSkipWord ')'];
firstTwoPairedWords = ['^(' pairedWord space '){2}'];
unwantedFirstPart = ['(' firstTwoPairedWords,skipWordSpace,'*)'];
wantedPart = ['(?<=' unwantedFirstPart ')' nonSkipWord space wordSpace '*'];
%Create the parser 
endString = @(inputString) regexpi(inputString,wantedPart,'match','once');
%Apply the parser to the examples
disp(endString(defaultCase))
disp(endString(nonDefault1))
disp(endString(nonDefault2))

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Regular Expression to match strings after a certain number of words that do not contain a keyword

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Regular Expression to match strings after a certain number of words that do not contain a keyword

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments