Regular Expression to match strings after a certain number of words that do not contain a keyword

8 views (last 30 days)
I'm attempting to use regular expressions to retrieve the middle of a string, and in the default case I need to match after two words, and in the non-default case I need to match after two words which do not contain a keyword.
An example input character array is
defaultCase = '1.2.3.4 Hello\ - my name is Bob'
This is fairly easy to handle - the below regex looks for two expressions which contain alpha_numeric characters, and then matches everything that follows the next alpha_numeric character.
%Returns 'my name is Bob'
matchedString = regexpi(defaultCase,'(?<=(\S*\w\S*\s[\s\W]*){2})\w.*','match','once')
Harder Case
nonDefault1 = '1.2.3.4 Hello Matlabbers - my name is Bob'
nonDefault2 = '1.2.3.4 Matlabbers - Hello - my name is Bob'
In this case I would like to explicitly not count the word Matlabbers in my look behind match - and I'd still like the output to be my name is Bob.
The best I've come up with is something like the following
%Returns 'my name is Bob'
regexpi(nonDefault2,'(?<=[\d\.]+\s+(Matlabbers)?\W*(?!Matlabbers)\S*\w\s\W+)\w[^\(]*\w','match','once')
This works for the nonDefault2 case, but in general it doesn't work. Does anyone know of a robust way to do this?

Accepted Answer

Matthew
Matthew on 2 Jan 2018
I ended up just building this in a very piece wise way.
%Demonstration Cases
defaultCase = '1.2.3.4 Hello\ - my name is Bob';
nonDefault1 = '1.2.3.4 Hello Matlabbers - my name is Bob';
nonDefault2 = '1.2.3.4 Matlabbers - Hello - my name is Bob';
%Word to skip during counting
skipBasic = 'Matlabbers';
%Set up the regular expression
word = '(\S*[a-zA-Z0-9]+\S*)';
space = '(\s[\W\s_]*)';
skipWord = ['(\S*' skipBasic '\S*)'];
skipWordSpace = ['(',skipWord space '?)'];
wordSpace = ['(',word space '?)'];
nonSkipWord = ['(\<(?!' skipWord ')' word '\>)'];
pairedWord = ['(' skipWordSpace '*' nonSkipWord ')'];
firstTwoPairedWords = ['^(' pairedWord space '){2}'];
unwantedFirstPart = ['(' firstTwoPairedWords,skipWordSpace,'*)'];
wantedPart = ['(?<=' unwantedFirstPart ')' nonSkipWord space wordSpace '*'];
%Create the parser
endString = @(inputString) regexpi(inputString,wantedPart,'match','once');
%Apply the parser to the examples
disp(endString(defaultCase))
disp(endString(nonDefault1))
disp(endString(nonDefault2))

More Answers (0)

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!