Alternative ways to generate a structure from strings without using eval?

I have a large set of strings (1000+) that imply a data structure in . notation. I want to build an internal structure in a data/string driven way. I have some code below that does what I want using eval, but I have "Eval is Evil" ringing in my ears. Is it a legitimate use of eval or am I missing something basic. It feels as if I should be able to do in a more elegant way without eval, but I just can't see it. Thank you in advance.
clear
mystruct = struct();
%%Strings that describe my struct
s{1} = 'mystruct.a(1).b(1).value';
s{2} = 'mystruct.a(1).b(2).value';
s{3} = 'mystruct.a(2).b(1).value';
s{4} = 'mystruct.a(2).b(2).value';
%%Values that I want to associate with my struct fields
v{1} = [1,2,3,4];
v{2} = [4,5,6,7];
v{3} = [8,9,10,11];
v{4} = [12,13,14,15];
%%Use eval to assign as if I was typing manually
for i = 1:length(s)
value = v{i};
eval([s{i},'=','value'])
end
%% Read the values back using indexing
for x = 1:2
for y = 1:2
mystruct.a(x).b(y)
end
end

1 Comment

"Is it a legitimate use of eval or am I missing something basic"
The design confuses data with code.
"It feels as if I should be able to do in a more elegant way without eval, but I just can't see it."
Lets look at a simpler example using arbitrary indices stored as text (e.g. '2,3,4') which are thus awkward to parse and use (because indices are numeric, not text). Thus when we store them as numeric then suddenly it becomes much easier to use them as indices, e.g.: C = {2,3,4}; A(C{:}), no evil EVAL or fiddly string parsing is required.
The same applies to this structure: storing the fieldnames and indices in their native form is the "more elegant way", e.g.:
C = {... this is just one way to store this, there are many other ways!
{'a',{1},'b',{1},'value'},...
{'a',{1},'b',{2},'value'},...
{'a',{2},'b',{1},'value'},...
{'a',{2},'b',{2},'value'}};
S = struct();
for k = 1:numel(C)
V = [k,sqrt(k)]; % random data
S = setfield(S,{1},C{k}{:},V);
end
Checking:
S
S = struct with fields:
a: [1×2 struct]
S.a(2).b(2)
ans = struct with fields:
value: [4 2]
As Bruno Luong states, you can parse the text to get the fieldnames and numeric indices (to pass to SUBSASGN, SETFIELD, etc), but you can skip that step by storing them as fieldnames and indices in the first place!
Note that text parsing is not as trivial as it seems once you convert to numeric and account for colons etc.

Sign in to comment.

Answers (2)

"I want to build an internal structure in a data/string driven way." - This is the design error already. Why did you decide for such an inefficient representation of your data?
Smarter:
s{1} = [1, 1]; % 'mystruct.a(1).b(1).value';
s{2} = [1, 2]; % 'mystruct.a(1).b(2).value';
s{3} = [2, 1]; % 'mystruct.a(2).b(1).value';
s{4} = [2, 2]; % 'mystruct.a(2).b(2).value';
v{1} = [1,2,3,4];
v{2} = [4,5,6,7];
v{3} = [8,9,10,11];
v{4} = [12,13,14,15];
mystruct = struct();
for k = 1:4
index = s{k};
myStruct.a(index(1)).b(index(2)).value = v{k};
end

3 Comments

Thanks Jan. I appreciate the answer. My problem is less about internal data representation, but more of a data import problem and access problem. Imagine that my data in the example is in a csv file that looks like this. Where the tags (columns 1000+) are in an arbitrary order, describe a tree of arbitrary structure and are of variable number and the rows are a time sequence (100000+ samples). This is the format I have, so is a (pre Matlab) constraint and not a design choice. Rightly or wrongly this was why I initially wanted to build my internal representation dynamically as a structure.
I am aware the pros and cons of internal data representations and if I need to do any heavy lifting then I will reshape the data into an more optimal representation, but my application suits the struct representation as they are about the user being able to access the data in a friendly way.
s.a(1).b(1).value, s.a(1).b(2).value, s.c.d.value, s.e.f(1).value
1 , 4 , 8 , 12
2 , 5 , 9 , 13
3 , 6 , 10 , 14
4 , 7 , 11 , 15
With this data design I don't really see an alternative. But that doesn't mean this is a legitimate use of eval, just that this data design might require it.
You could use regexp to extract the fieldnames and indices and use dynamic fields assignments, but I don't think that is that much better. It would allow you to validate the inputs instead of executing random user-provided code.
@Adrian Smith: Yes, I know such "brown field projects", in which the predecessors have left some strange and inefficient data decisions, which makes it hard to impossible, to use it as a base of clean code.
Nevertheless, I hestitate to suggest dirty code for general reasons. It is the nature of software projects, that from time to time there is the urgent need to rebuild them from scratch, the so-called "refactoring".
But, well, you need a dirty method. Then I suggest to decide for a dirtiness, which allows debugging and ensures repeatability. Convert the input files to code statically instead of dynamically:
% Contents of the file, e.g. obtained by fileread('DataFile1.dat'):
C = ['s.a(1).b(1).value, s.a(1).b(2).value, s.c.d.value, s.e.f(1).value', newline, ...
'1 , 4 , 8 , 12', newline, ...
'2 , 5 , 9 , 13', newline, ...
'3 , 6 , 10 , 14', newline, ...
'4 , 7 , 11 , 15'];
if C(end) == newline
C = C(1:end-1);
end
breaks = strfind(C, newline);
width = sum(C(1:breaks(1)) == ',') + 1;
C = strrep(C, newline, ',');
D = reshape(strsplit(C, ','), width, []);
D = strtrim(D);
Output = unique(strtok(D(:, 1), '.'));
[fid, msg] = fopen('DataFile1.m', 'w'); % Use an absolute path
assert(fid > 0, msg);
fprintf(fid, 'function [%s] = %s()', strjoin(Output, ', '), 'DataFile1');
for k = 1:width
fprintf(fid, '%s = [%s];\n', D{k, 1}, strjoin(D(k, 2:end), ', '));
end
fprintf(fid, 'end\n');
fclose(fid);
Instead of an evil eval you have a dirty converter, which compiles data file to Matlab code.
This will crash your machine also, if one of the variables is called 'system("format C:"); s' or 'quit;q', but you have at least a chance to find it in the source code, because the commands are written to files instead of created dynamically.
My conclusion:
  • Read my suggestion for educational reasons and learn how it works.
  • Then use Bruno's suggestion: Parse the names of the variables and use subsasgn. Then compare the runtime and code complexity with your eval approach and stay at the simpler code.
  • Count the days until a refactoring of the input data is the only choice to limit the exploding complexity of the project. It is painful to inherit a brown field project using such ugly input data from somebody else. But it is worse to leave this junk for your successors.
Welcome to the coding hell :-)

Sign in to comment.

If you want to avoid EVAL you can parse the char array and transform it to S and B arguments of subsasgn function
Do the same with subsref when you want to use downstream.
The coding will be much longer than the single statement using EVAL.

Categories

Products

Release

R2022b

Asked:

on 21 Oct 2022

Edited:

on 25 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!