get unknow Strings(Text) out of other strings ?

Hey Guys, can u give me an example, that shows how to get information out of a long String ? If......
the Word before the Information u want is known. Example
name: Benny age:23 ( I want the Info Age)
Out of this code
<HTML><FONT color="0000FF">Used Amplification(Hidden)</FONT></HTML>
I want the Information:Used Amplification(Hidden)
I guess the KEy is regexp again.....but i think i am wrong....sorry for these stupid questions.

 Accepted Answer

If you don't want to learn the regex syntax, you can use strfind:
before = '<HTML><FONT color="0000FF">';
after = '</FONT></HTML>';
start = strfind(str, before) + length(before); %or just length(before)+1 if str always starts with before.
end = strfind(str, after) - 1;
result = str(start:end); %assumes there's only ever one match
It's of course a lot more flexible and shorter with regexes:
result = regexp(str, '<HTML><FONT color="[0-9A-F]+">(.*?)</FONT></HTML>', 'tokens', 'once'); %with added bonus it will work with any color, not just 0000FF.

5 Comments

I try to learn more about regexp...but I cant which of singes i have to use...../w means every letter of the alphabet ... where is the rest
Not sure what you're saying because of the typo.
\w (not /w) is a character class that not only matches every letter but also number and underscore. I find matlab character classes ill defined and rarely use them. To match any letter of the alphabet I would use [a-zA-Z]
There are many ways I could have built the regex in my answer. Let's parse it:
'<HTML><FONT color="'
There are no special character in that bit, so it just matches it exactly
'[0-9A-F]'
Means match any character between 0-9 or A-F (basically any hex character)
'+'
means match the expression just before (hex character) one or more time. Hence it will match any series of hex characters and stop as soon as a character differs from 0-9A-F
'">"'
Not special character. matches exactly
'('
Begins the definition of a token. Anything between '(' and ')' is a token
'.'
Matches any character
'*?'
Matches the previous expression ('.') zero or more times. Hence it will match any character zero or more time. I use the non-greedy version of '*' here which means it will only match as many character as needed for the whole regex to succeed.
')'
Marks the end of the token. I extract the token with the 'token' option of regexp.
'</FONT></HTML>'
No special character matches exactly.
---
There are plenty of resources on the web to learn regular expression. Matlab helps is probably not the best reference.
Hence Matlab uses C++, can i take the c++ regular expression help ?
Matlab regular expression engine is slightly different than C++ std::regex and other posix compliant regexes, it's not as good on some things (e.g. captures) but the basics are the same so, yes, you can use tutorial for C++ or any other language.
This one seems to cover the basics and is language agnostic.
@Guillaume — That has to be one of the best explanations of regexp I’ve read!
+1

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!