regexprep incorrect multiple replacement

6 views (last 30 days)
I am trying to replace numbers in a char vector with different numbers using regexprep.
Let's say we have the following char vector as input:
str = 'abc(1,2,3)';
I would like to replace '1','2' and '3' with different numbers.
Let's say I want to replace the numbers with the following numbers:
rep = '{'5';'8';'3'};
My desired output is:
str = 'abc(5,8,3)';
The format for using regexprep is:
regexprep(str,expression,replace)
I have tried to solve the problem in two ways:
  • One expression.
expression = '\d';
replace = {'5';'2';'3'};
regexprep(str,expression,replace)
ans = 'abc(3,3,3)'
The output is incorrect, despite the documentation stating:
If replace is a cell array of N character vectors and expression is a single character vector, then regexprep attempts N matches and replacements.
  • Multiple expressions.
expression = {'\d';'\d';'\d'};
replace = {'5';'2';'3'};
regexprep(str,expression,replace)
ans = 'abc(3,3,3)'
The output for the second case is incorrect, despite the documentation stating:
If both replace and expression are cell arrays of character vectors, then they must contain the same number of elements. regexprep pairs each replace element with its corresponding element in expression.
In both cases regexprep is replacing all three matches using only the last value from the replace cell array, rather than all three.
What am I missing?
  2 Comments
Stephen23
Stephen23 on 5 Jun 2018
Edited: Stephen23 on 5 Jun 2018
"The output is incorrect, despite the documentation stating:..."
"What am I missing?"
The output is correct in both cases. The documentation states that it "...attempts N matches and replacements": so it matches the digits and replaces them with cell one, then it starts afresh and matches the digits and replaces them with cell 2, then it starts afresh and matches the digits and replaces them with cell 3. Which is exactly the output you are getting.
Each time regexp starts parsing the string from the start again, whereas you assumed that it starts from where it finished replacing the last string. To get the behavior you want you will have to add a dynamic expression of some kind.
Paolo
Paolo on 5 Jun 2018
Thank you for your comment, that kind of makes more sense. I did not realize that it parses the string multiple times!
Could you share an example as to how I could achieve what I am trying to do?

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 5 Jun 2018
regexprep (S, {A, B }, { P, Q })
is the same as
regexprep( regexprep(S, A, P), B, Q)
That is, the first pair is applied to the entire string, and the second pair is applied to the string that results.
It appears to you that only the third was done because your replacement text happens to match the second and third pattern and got rereplaced.
The 'once' option will not solve the problem.
  3 Comments
Walter Roberson
Walter Roberson on 5 Jun 2018
str = 'abc(1,2,3)';
regexprep(str, '\d+(\D+)\d+(\D+)\d+', '5$18$23')
The $1 in the replacement pattern matches the first () expression, the $2 matches the second () expression. So we match one or more digits, then remember the sequence of non-digits that follows that, then match another series of digits, then remember the sequence of non-digits that follows that, then match another series of digits. And we replace that all with fixed text followed by the first remembered series of non-digits, then fixed text followed by the second remembered series of non-digits, then more fixed text.
Paolo
Paolo on 5 Jun 2018
Amazing, thank you Walter! Very helpful explanation.
I actually did try to do something like that however my expression was incorrect/incomplete:
regexprep(str, '(\d+)(\d+)(\d+)', '5$18$23')

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!