problems with a regex

2 views (last 30 days)
Thomas
Thomas on 9 Jul 2013
Hi.
I'm trying to create a regular expression to match and extract some information. Two examples of the source string
example one: 10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv
example two: 10/2/leaf.nr.2 is a Projection error - 3D points.csv
I want to extract the string between "is a " and " - touches edge" OR " - 3D" In both example strings this would be "Projection error" but this can be something else.
Currently I have the pattern:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)?(?:\s\-\s3D).*.csv'
for example one this returns (not expected):
'Projection error - touches edge'
but for example two it returns(expected):
'Projection error'
IF I change the pattern to:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)(?:\s\-\s3D).*.csv'
so I require the (?:\s\-\stouches\sedge) to be matched it returns (correctly):
'Projection error'
for example one but now example two (that dont have the the "touches edge" part ) will not match(of cause).
I dont get why example one also contains the " - touches edge" in the result using the first pattern when I ask it to match this pattern 0 or 1 times.
Any help will be highly appreciated.
Best regards, Thomas
  1 Comment
Thomas
Thomas on 9 Jul 2013
My current solution is to use this pattern instead:
'.*is\sa\s(?<type>[\w\s]*)(?:\s\-\s)?.*'
It results in the needed information except an extra space character are added. So the result for both example one and two are now:
"Projection error "

Sign in to comment.

Answers (2)

Muthu Annamalai
Muthu Annamalai on 9 Jul 2013
A simple solution to parse the string with rule
"is a " and ( " - touches edge" OR " - 3D" )
is to use sequential regexp().
That way you know "is a" bit of your source is split out, and then you can search for which of 2 alternatives are present in your case.
Also see the 'NOT' exclusion class operators in regexp, and 'split' mode of regexp.
http://www.mathworks.com/help/matlab/ref/regexp.html
  1 Comment
Thomas
Thomas on 9 Jul 2013
Thanks for your response.
My task is not to match either of the two cases - its simply to extract the string between "is a " and the first " - " (This is a new, shorter, formulation of my problem that I just realized)
Splitting would be a way to go but I would like to know if its possible to create a regex for it.

Sign in to comment.


per isakson
per isakson on 9 Jul 2013
Edited: per isakson on 9 Jul 2013
to extract the string between "is a " and the first " - " This formulation is close to a pseudo-code for the expression we search.
ex1 = '10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv';
ex2 = '10/2/leaf.nr.2 is a Projection error - 3D points.csv';
regexp( ex1, '(?<=is a )[^\-]+(?= \- )', 'match' )
regexp( ex2, '(?<=is a )[^\-]+(?= \- )', 'match' )
returns
ans =
'Projection error'
ans =
'Projection error'
Search the doc for "Lookaround Assertions" or just "Lookaround". Lookahead Assertions in Regular Expressions
PS. '\-' or just '-' ; a backslash (escape) too many seldom hurts and I've problems to remember when it's needed.
.
OR according to the requirement of the OP
regexp( ex1, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
regexp( ex2, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
The extra parentheses, (), makes the expression more readable - imo.
The "?" in ".+?" is the
Lazy expression: match as few characters as necessary.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!