regexprep does not exactly what I want

Dear all,
I have the following cell array
Charge = {'OH-1'} {'KOH+0'} {'K+1'} {'I-1'} {'HI+0'} {'H3O+1'} {'H2O+0'}
I want to remove all information before the + and - signs. Therefore I tried the following:
regexprep(Charge,'[^-+].','');
which produces
{'-1'} {'0'} {'1'} {'1'} {'+0'} {'1'} {'0'}
This works well except in case of only one character in front of the minus sign (i.e. in case of I-1). In that case, the - sign is also deleted. The - signs are crucial to be included, the + signs not.
Any suggestions?
Thanks, Tim

 Accepted Answer

Daniel M
Daniel M on 16 Oct 2019
Edited: Daniel M on 16 Oct 2019
There's definitely a way to do it using regexprep, but I found this solution first, so hopefully it is sufficient.
Charge = {'OH-1','KOH+0','K+1','I-1','HI+0','H3O+1','H2O+0'};
c = regexp(Charge,'[-+]\w*','match');
cc = cat(2,c{:}); % put back into cell array

6 Comments

Great stuff, many thanks for this!
regexprep(Charge,'^\w*','')
I'm not convinced that '[-+]\w*' is the right regexp. This constrains the symbols that follow the + or - to letters, digits or _. This restriction may or many not be appropriate.
The regexp that matches exactly your specification would be
regexp(Charge, '[+-].*', 'match', 'once')
Your original regexprep did not work at all. It basically said remove pairs of characters where the first character is anything but - or + and the 2nd one is anything. So looking at 'H3O+1', the first pair is 'H3', it doesn't start with a + or -, so is removed. the 2nd pair is 'O+'. Again, it doesn't start with a + or -, so is removed. Now with 'HI+1', the first pair is 'HI', removed, the 2nd one is '+1', starts with + so not removed. If you had something like 'H3O+01', it would have removed everything since the scan would remove 'H3', then 'O+', then '01'.
A regexprep that would have worked would be:
regexprep(Charge, '[^+]+\+|[^+]+(?=\-)', '')
Thanks Guillaume, I agree that '[-+]\w*' is not robust enough (which is why I voted for Stephen's solution), but it does satisfy his test case.
As for the comment on the regexprep, I'm not sure if you're referring to me or not. I wrote
regexprep(Charge,'^\w*','')
ans =
{'-1'} {'+0'} {'+1'} {'-1'} {'+0'} {'+1'} {'+0'}
which works. Your example however doesn't:
regexprep(Charge, '[^+]+\+|[^+]+(?=\-)', '')
ans =
{'-1'} {'0'} {'1'} {'-1'} {'0'} {'1'} {'0'}
As you can see it drops the sign of the charge.
The comment about the regexprep referred to the original question, not your answer.
I wrote most of my comment shortly after you posted your answer but had to dash off to a meeting before posting it. When I finally posted it, it was a bit out of date. Sorry about that.
Many thanks to all of you for helping out here!

Sign in to comment.

More Answers (1)

Stephen23
Stephen23 on 16 Oct 2019
Edited: Stephen23 on 16 Oct 2019
>> regexprep(Charge,'^[^-+]*','')
ans =
'-1' '+0' '+1' '-1' '+0' '+1'
>> regexp(Charge,'[-+].+$','once','match')
ans =
'-1' '+0' '+1' '-1' '+0' '+1'

1 Comment

This will handle edge cases better than my solution above. More robust.

Sign in to comment.

Categories

Tags

Asked:

on 16 Oct 2019

Commented:

on 17 Oct 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!