How to extract variables from Character array

Is it possible to get variable names from character array of only particular range?
For example, My char array and it looks like below
en:
variable_en1 = expression; variable_en2 += expression;
variable_en3 := expression;
variable_en4++;
variable_en5--;
du:
variable_du1= expression;
variable_du2 := expression
ex:
variable_ex1=0;variable_ex2=1;
variable_ex3 = 2;
I would like to extract only variable_en1 to variable_en5 in one array and variable_ex1 to variable_ex3 in another arry.
I am attaching character array .mat file.
Could you please help me?

4 Comments

Stephen23
Stephen23 on 19 Jun 2015
Edited: Stephen23 on 19 Jun 2015
Please do not attach screenshots of text (code, inputs, outputs, etc), instead it is much easier for us if you actually give the actual text itself (formatted correctly of course, using the paperclip button).
N/A
N/A on 19 Jun 2015
Edited: N/A on 19 Jun 2015
sorry Stephen.I thought, it will give clear picture.I will not do next time and I have updated question again now. Thanks you.
Stephen23
Stephen23 on 19 Jun 2015
Edited: Stephen23 on 19 Jun 2015
What are "variables"? The question is not very clear.
You uploaded a .mat file containing one string. There are easy ways to extract parts of a string (particularly indexing or regular expressions), but you have not explained what part of the strings you are interested in. Please give exact examples of the desired output, and an explanation of how these should be identified (e.g. preceding or trailing characters, newline locations, character patterns, etc).
N/A
N/A on 19 Jun 2015
Edited: N/A on 19 Jun 2015
That one string is from state flow state. I want to find the variabels in from en:. when you load .mat file, it gives one string. There I need to find left side varibles.

Sign in to comment.

 Accepted Answer

Stephen23
Stephen23 on 19 Jun 2015
Edited: Stephen23 on 19 Jun 2015
Thank you for editing your question and making it clearer.
You can use regexp to locate these substrings. Here are several different versions using regexpi, which I tested on your sample .mat file:
>> regexpi(transDestiLabel,'^[a-z]+(?=\s\S?=)','match','lineanchors')
ans =
'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h'
>> regexpi(transDestiLabel,'^[a-z]+(?=+|-)','match','lineanchors')
ans =
'i' 'j'
>> regexpi(transDestiLabel,'^[a-z]+(?=\s\S?=|+|-)','match','lineanchors')
ans =
'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'
The sample file:

12 Comments

Hello Stephen,
Thank you very much.. Actually, my variables are not [a-z]. so I have just changed you code line to
regexpi(transDestiLabel,'^[\w]+(?=\s\S?=|+|-)','match','lineanchors')
but I want to have all entry variable in one array and exit variables in another array. I am attaching new .mat file. Please have a look on it once.
Plese let me know how can I get it?
entry variable can be identified after en: till du: and exit varibles can be identified by after ex:. In my new .mat file, variable_en1 to variable_en11 are entry variables and variable_ex1 to variable_ex3 are exit variables.
This creates a structure of the variable names, grouped by those headings:
load('labelStrings.mat')
[C,S] = regexp(transDestiLabel,'(?<=\s|;)\w+(?=(\s\S?)?=|+|-)','match','start');
[D,T] = regexp(transDestiLabel,'^\w+(?=\:$)','match','start','lineanchors');
D(2,:) = arrayfun(@(b,e){C(b<S&S<e)},T,[T(2:end),Inf],'UniformOutput',false);
X = struct(D{:});
Lets look at the output in the command window:
>> X
X =
en: {1x11 cell}
du: {'variable_du1' 'variable_du2'}
ex: {'variable_ex1' 'variable_ex2' 'variable_ex3'}
>> X.en'
ans =
'variable_en1'
'variable_en2'
'variable_en3'
'variable_en4'
'variable_en5'
'variable_en6'
'variable_en7'
'variable_en8'
'variable_en9'
'variable_en10'
'variable_en11'
This was tested on your recently uploaded .mat file:
Great..Thanks a lot..
Once again.I am facing same problem with other char array. This time I have too many types(like en:,du:,ex:,entry:,exit:,during:,en,du:, en,du,ex:).
could you please help me ? I have attached new .mat file.
Thank you..
Such sloppy file formatting makes it hard to parse. If the file was neater then this would be a much simpler task. Some improvements that would make parsing simpler:
  1. ensuring that the group headings are on separate lines to the variables
  2. one variable per line
  3. consistent whitespace: some group header have whitespace, some have none, some have whitespace infront of the colon...
  4. no leading spaces
The more untidy the file format is the harder it is to parse. And this one is a mess:
green
en: green_led=1;
variable_en1 = 1;
variable_en2 += 2;
du:
variable_du1= 1;
varibale_du2 := 2;
ex:
variable_ex1=1;variable_ex2=1;
variable_ex3 = 2;
green_led=0;
entry:
variable_entry1 = 1;
variable_entry2 += 2;
during:
variable_during1= 1;
varibale_during2 := 2;
exit:
variable_exit1=1;variable_exit2=1;
variable_exit3 = 2;
en, du:variable_endu1=1;
variable_endu2=2;
en,ex :
variable_enex1=1;
variable_enex2=2;
du,ex:
variable_duex1=1;
variable_duex2=2;
en,du, ex:
variable_enduex1=1;
variable_enduex2=2;
entry, during:variable_entryduring1=1;
variable_entryduring2=2;
entry, exit :
variable_entryexit1=1;
variable_entryexit2=2;
during, exit:
variable_duringexit1=1;
variable_duringexit2=2;
entry, during, exit:
variable_entryduringexit1=1;
variable_entryduringexit2=2;
In any case, have a play with this, it might do what you want:
load('stateFlowCode.mat')
[C,S] = regexp(allTypesOfActions,'(?<=\s|;|\:)\w+(?=(\s\S?)?(+|-|\:)?=)','match','start');
[D,T] = regexp(allTypesOfActions,'(?<=\n\s*)[\w_, ]+(?=\s?\:[^=])','match','start');
D = regexprep(strtrim(D),',\s?','_');
D(2,:) = arrayfun(@(b,e){C(b<S&S<e)},T,[T(2:end),Inf],'UniformOutput',false);
X = struct(D{:});
Where X is
en: {'green_led' 'variable_en1' 'variable_en2'}
du: {'variable_du1' 'varibale_du2'}
ex: {'variable_ex1' 'variable_ex2' 'variable_ex3' 'green_led'}
entry: {'variable_entry1' 'variable_entry2'}
during: {'variable_during1' 'varibale_during2'}
exit: {'variable_exit1' 'variable_exit2' 'variable_exit3'}
en_du: {'variable_endu1' 'variable_endu2'}
en_ex: {'variable_enex1' 'variable_enex2'}
du_ex: {'variable_duex1' 'variable_duex2'}
en_du_ex: {'variable_enduex1' 'variable_enduex2'}
entry_during: {'variable_entryduring1' 'variable_entryduring2'}
entry_exit: {'variable_entryexit1' 'variable_entryexit2'}
during_exit: {'variable_duringexit1' 'variable_duringexit2'}
entry_during_exit: {'variable_entryduringexit1' 'variable_entryduringexit2'}
Hello Stephen,
Is it possible to store en,entry,en_du,en_ex,en_du_ex,entry_during,entry_exit and entry_during_exit in one struct field and
ex,exit,en_ex,du_ex,en_du_ex,entry_exit,during_exit,during_exit and entry_during_exit in another struct field?
Of course, you can rearrange the data in any way that you would like to. Note that merging data based on semantic meaning of the data (and not the format of the data in the file) means that it is conceptually a different operation and hence should be performed after reading the data from the file.
You can read about how to access data in structures, and try something like this (untested):
Z.entry = [X.en, X.entry, X.en_du, X.en_ex, X.en_du_ex, X.entry_during, X.entry_exit, X.entry_during_exit];
Z.exit = [X.ex, X.exit, X.en_ex, X.du_ex, X.en_du_ex, X.entry_exit, X.during_exit, X.during_exit, X.entry_during_exit];
Of course if you have some way to automatically identify and categorize those fields then this could be done automatically. However, as mentioned above, this is a completely different problem to the one of reading the data file.
N/A
N/A on 6 Jul 2015
Edited: N/A on 7 Jul 2015
Thank you Stephen. What if one of field doesnt exists in strcture while adding? Becuase, I am not sure the whether I will be having all the fields everytime. some times, I may have en and en_ex or entry and entry_exit.
how do I check this and combine similar fields which are existing? could you please let me know how can combine entry related fileds and exit related fields ?
It really depends on how those groups are specified, which you have not given any information on. Once you know how to identify the groups, then select the data using some basic MATLAB indexing:
fld = fieldnames(X);
idx = ~cellfun('isempty',strfind(fld,'ex'));
ide = ~cellfun('isempty',strfind(fld,'en'));
Y = struct2cell(X);
Z.entry = Y(ide);
Z.exit = Y(idx);
I simply assumed that all fields containing 'ex' belong in the exit group, and all containing 'en' in the entry group.
Yes. This is what I expected..Thank you..
and other side, I am trying to understand regexp which you used to filter the text but I could not get well..
[C,S] = regexp(label,'(?<=\s|;|\:)\w+(?=(\s\S?)?(+|-|\:)?=)','match','start')
what I understood : (?<=\s|;|\:)\w matches string that follow : or ; and identifies word and
(?=(\s\S?)?(+|-|\:)?=) will matches white spaces and non-white spaces then followed by + or - or :
Is my understanding is correct or am I missing something?
I try to filter to get variables on one more char array which is attached to post but I am missing some variables. could you please check and let me know what is wrong?
If you want to play around with regular expressions, try using my FEX submission, which lets you interactively build regular expressions and check them on a piece of text:
and keep reading this and trying examples until it all makes sense:
Lets break down the regular expression:
(?<=\s|;|\:)\w+(?=(\s\S?)?(+|-|\:)?=)
(?<=\s|;|\:) % preceded by whitespace, ; or :
\w+ % any alphanumeric word
(?= % followed by...
(\s\S?)? % maybe whitespace + non-whitepsace
(+|-|\:)? % maybe +, - or :
=) % equals sign
Hmmm... it seems like the \S? is not really required.
As I noted in an earlier comment the reasons this regular expression is so complicated is because the file format is a complete mess. If you can tidy up the file format, then identifying the variables becomes much easier.
Good luck!

Sign in to comment.

More Answers (0)

Categories

Products

Asked:

N/A
on 19 Jun 2015

Commented:

on 8 Jul 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!