Not just another dynamic variable naming question! Generating a new string and using it in a command.

2 views (last 30 days)
I am working on a large data project that has potentially thousands of similarly named files in CSV format that I need to analyze in matlab. These files are named nnnwavex.csv (where nnn = a number and x = either I II III or V).
Ex Directory Contents
  • 352waveI.csv
  • 352waveII.csv
  • 352waveIII.csv
  • 352waveV.csv
My previous code to import this was to enter the following
wave352i = importdata('352waveI.csv');
tele352i = wave352i.data;
text352i = wave352i.textdata;
wave352ii = importdata('352waveII.csv');
tele352ii = wave352ii.data;
text352ii = wave352ii.textdata;
wave352iii = importdata('352waveIII.csv');
tele352iii = wave352iii.data;
text352iii = wave352iii.textdata;
wave352v = importdata('352waveV.csv');
tele352v = wave352v.data;
text352v = wave352v.textdata;
Now I have generated a bit of variable naming code that seems to work well for the first part (importing the file as a struct of data and textdata). I have spent hours reading the material about why variable naming is not ideal. This is simply to import large numbers of files quickly.
x=352;
y='wave';
z='I'
a='.csv'
b='tele'
c='.data'
cat(2,num2str(x),y,z,a)
eval(['temp = importdata(ans)']);
cat(2,y,int2str(x),z);
v = genvarname(ans);
eval([v '= temp']);
z='II'
cat(2,num2str(x),y,z,a)
eval(['temp = importdata(ans)']);
cat(2,y,int2str(x),z);
v = genvarname(ans);
eval([v '= temp']);
z='III'
cat(2,num2str(x),y,z,a)
eval(['temp = importdata(ans)']);
cat(2,y,int2str(x),z);
v = genvarname(ans);
eval([v '= temp']);
z='V'
cat(2,num2str(x),y,z,a)
eval(['temp = importdata(ans)']);
cat(2,y,int2str(x),z);
v = genvarname(ans);
eval([v '= temp']);
clear x y z v temp c b ans a
This leaves me with four struct files
  • wave352I
  • wave352II
  • wave352III
  • wave352V
how can I make a script that can then build a variable name that accomplishes the following & gives me a double and a cell from the original struct?
If I create a string a = 'wave352V.data', I cannot use it in my code to call that subset of the struct and apply it to a new variable.
tele352v = wave352v.data;
text352v = wave352v.textdata;
I could use excel and word and build a script that uses mail merge to create multiple iterations of the initial code I demonstrated above, but that seems like a very amateur way to approach this.
Any ideas would be appreciated!
  3 Comments
Thomas Fogarty III
Thomas Fogarty III on 6 May 2017
Thanks for the response!
Each ### is a patient and each of the roman numerals is a separate ECG channel. Each of the leads has hundreds of rows of data that need to have several pre-processing and filtering steps applied to them. There are a few steps that do calculation on all signals at once (ie the same row on all four channels).
The data I'm referencing in this example comes from a CSV file thats structured like so:
Datetime Stamp | Channel | 480x Samples
(first two columns => .textdata; 480x samples => .data)
I do have to access the time date stamps held in the .textdata section initially to determine a time, but most of the work is done through the .data component of the struct.
I find working in cell's very painful and have avoided that route. I'm open to suggestions if there is a good alternative out there.
Stephen23
Stephen23 on 6 May 2017
Edited: Stephen23 on 6 May 2017
@Thomas Fogarty III: "I find working in cell's very painful and have avoided that route". Okay... today might be a good day to practice using cell arrays then, because the more you practice using them, the easier they are to use! Take the plunge with cell arrays and you will learn more useful ways of using MATLAB, and you will learn how to solve your question yourself using simple code.
Do you have a particular reason why you need to have lots of separately named variables in your workspace? I read your question and comments several time carefully, but could not find one.
"I have spent hours reading the material about why variable naming is not ideal". Hopefully you understood that there are multiple reasons why doing this is a bad idea, not least of which is what on earth will you do with a million separate variables in your workspace? Let me illustrate: I want to calculate sine of several values, which might seem easy with just two variables:
a1 = 0;
a2 = pi/2;
sin(a1)
sin(a2)
what happens if now I now have one million values? Would I write one million times:
sin(a1)
sin(a2)
...
sin(a1000000)
Or would I use the much better method of putting my data into one array? In fact second half of your title deserves comment: "Generating a new string and using it in a command." Once you have generated one million random variables names using eval, the only way to call any operator with those variables is to use eval again. And so it goes on... you paint yourself into a corner, with no way out.
Why do you want/need to do this? Note that you have not yet given a single reason why you cannot use the better programing methods suggested in the FAQ, the tutorials, the MATLAB documentation, or the answers below.
Processing multiple data files usually involves looping or comparing of their data, and the method is simple: put the data into an array (ND numeric, cell, struct, table, etc) and use indices. It is so simple that it works! (and is also faster, more reliable, simpler, etc, etc). So far you have not given one single reason why you cannot do this.
Please do not ignore dpb's comment "Once we have a clear definition of the task, THEN we can approach the sequence of operations to accomplish it." You do not tell us what your task is. So far we do not have a clear explanation of what you are trying to do with this data. Do you want to merge that data into new files, process some particular fields or values over all measurements, or process each measurement independently?

Sign in to comment.

Accepted Answer

dpb
dpb on 6 May 2017
Edited: dpb on 7 May 2017
"Each ### is a patient and each of the roman numerals is a separate ECG channel. Each of the leads has hundreds of rows of data that need to have several pre-processing and filtering steps applied to them. There are a few steps that do calculation on all signals at once (ie the same row on all four channels)."
Well, now we're finally beginning to get somewhere...still much to be clarified, but this at least gives a place from which to start.
First of all, it is trivial as Stephen has pointed out to open a file for any given patient number; simply
pn=input('Which patient, please? ','s');
srchstr=sprintf('%s*',pn); % build root search pattern
d=dir([srchstr '.csv']); % and search for those .csv
will return the list of all .csv files for the patient which you can then process sequentially by iterating over the directory structure, d.
If you need a given channel, add that to the wildcard pattern; you're fortunate in having created (or someone did for you) a pretty usable naming convention here.
chnls={'I','II','III','IV'}; % the list from which to choose
[ich,ok]=listdlg('PromptString','Select Channel', ...
'SelectionMode','single', ...
'ListString',chnls);
if ok
srchstr=strcat(srchstr,chnls(ich)); % add chnl suffix
end
d=dir([srchstr '.csv']);
also illustrating another user interactive tool that may make usage simpler for you.
Given your description of the file structure, one very likely way you might approach it would be, as Steven suggested, the table. As a trivial example, I built a sample record...
ts=cellstr(datestr(now)); % a timestamp
chnl={'C001'}; % channel/ASCII?? code however is
ecg=randn(1,480); % and the 480x data vector associated @t
t=table(datetime(ts), ...
categorical(chnl), ...
ecg, ...
'VariableNames',{'Time','Chnl','ECG'})
t =
Time Chnl ECG
____________________ ____ ______________
06-May-2017 09:25:24 C001 [1x480 double]
and VOILA! you've got a database. Patient ID can be incorporated, too, of course, as well as any other corollary info required. Then grouping variables can be used to select desired characteristics from the table by patient, time, whatever...
What we still don't know now is how much of these data are to be processed concurrently, but there surely are very powerful ways to organize the data to do whatever it is that is required without needing named variables(*) to do so.
The above is just one of many alternatives; which would work best will depend on the remaining details of what actually is to be done with the data as far as the agglomeration of what is needed in a single data structure at any one time.
(*) NB: A "named variable" is NOT in any way, shape or form the same as using a variably-named structure fieldname or as the variable name in a table. The former is essentially impossible to deal with in a generic fashion with any code of any complexity whatever; the latter are, essentially, trivial to use in Matlab.
  1 Comment
dpb
dpb on 7 May 2017
"The above is just one of many alternatives; which would work best will depend on the remaining details of what actually is to be done with the data as far as the agglomeration of what is needed in a single data structure at any one time."
In fact, if might be useful to build a database of all the data files controlling information without the actual data that could be used to build complex filename lists to be passed to the processing function that Steven (and others) have so admirably described.
This could include things like patient ID, perhaps the corresponding actual patient info if it is known and important if not just blind results study, the channel(s) available if not all for every case, the time information, number of samples available for a given patient, etc., etc., etc., ... Searching the table for conditions then would be much quicker than repeated calls to dir and searches therein plus it becomes a valuable tool to be able to produce summary statistics over the dataset in toto.

Sign in to comment.

More Answers (3)

Steven Lord
Steven Lord on 6 May 2017
Don't write a script to do this. Write a function that accepts the name of the file to be imported and returns the specific pieces you want from that file. The names of the variables you use inside the function don't matter to code outside the function under most circumstances, so you can use general names in that function.
function [dataVector, scalarStatus] = readMyFiles(filename)
data = load(filename);
% process data to generate the variables dataVector and scalarStatus
Once the data returns from your file, if you must store it in some way that allows you to access it via name, consider a struct array or a table (if your file name is also a valid MATLAB identifier as per the isvarname function) or perhaps a containers.Map object.
  2 Comments
Thomas Fogarty III
Thomas Fogarty III on 6 May 2017
Thanks for your reply - the problem with this approach is that I have upwards of 300-400 files at present that need importing and they follow the same pattern and naming convention, something that should be easy to code/automate. I initially was trying to make a function (like the one below) that accepts the patient ###, and then build new variables that read the files that pertain to that patient ###. But at the end of the day, it would require me putting in output names for each variable and really be only a touch quicker than running a script or doing it manually. I suppose I could prep the data files differently through BASH but part of me feels like I'm missing something big functionally in MATLAB.
function [outputs] = myfunction(###)
t###I = importdata('###I.csv')
wave###I = t###I.data
text###I = t###I.textdata
t###II = importdata('###II.csv')
wave###II = t###II.data
text###II = t###II.textdata
... etc
Example
[t101I, wave101I, text101I, t101II, wave101II, text101II, t101III, wave101III, text101III, t101V, wave101V, text101V] = myfunction(101);
% would read from 101I.csv 101II.csv 101III.csv and 101V.csv
Further, it wouldn't let me code the name of the datasource dynamically using a simple ### as the input (i.e t###I.textdata).
I appreciate your help and input. I know this concept of dynamic names make most of you guys wince!
Stephen23
Stephen23 on 6 May 2017
Edited: Stephen23 on 6 May 2017
"But at the end of the day, it would require me putting in output names for each variable and really be only a touch quicker than running a script or doing it manually"
Quicker how? Quicker in terms of code running time? No, your script with eval will be slower. Quicker in terms of writing time? No, you have already wasted hours on this and had to ask for advice on an internet forum. Using a loop and a cell array wold have taken 30 seconds to write. Quicker in terms of debugging time? Without all of the help that MATLAB gives when writing code properly using a loop, that would be a joke!
"I have upwards of 300-400 files at present that need importing and they follow the same pattern and naming convention, something that should be easy to code/automate." It is easy. People write code like this all the time. They import tens/hundreds/thousands/millions of files easily, without any problems, by using loops (and not just in MATLAB, but most any language). The MATLAB documentation tells us how, and also many threads on this forum show how:
"I know this concept of dynamic names make most of you guys wince!" Forget about us, how about reading the MATLAB documentation?:

Sign in to comment.


Les Beckham
Les Beckham on 7 May 2017
Edited: Les Beckham on 7 May 2017
A lot of what I'm going to say has already been said by far smarter people on this thread. I'm going to rephrase from a less specific viewpoint, in hopes that it will help you grasp what the others have been telling you.
As I see it, you are facing this situation:
  1. You have a lot of input data to process and, fortunately, this data is stored in files with a well defined file naming convention.
  2. You need to process each of these files (or well defined groups of them) with some algorithm. You yourself have used the word 'each' several times and, to me, that screams 'loop'
So, you need to define:
  1. How do I group these files logically (using the naming convention)?
  2. Do you need to load and process one file at a time or groups of them?
  3. What algorithm do I need to apply to each file or group?
Once you've answered those questions, the FAQs on how to process multiple files come into play along with these points:
  1. When you are looping through a bunch of data to apply the same algorithm to each piece of data you should implement that algorithm in a function where you pass the data to the function and get back the result.
  2. This kind of data (and the corresponding results of the algorithm applied to the data) are often best stored in an array.
  3. This 'array' can be a basic numeric array or a cell array or an array of structs or even a table.
  4. Remember that the fact that the source data came from files with unique names does not mean that the processing of each of those files (or appropriate groups of files) needs to retain those names. For processing (applying your algorithm), you simply have 'input' data and 'output' data. Where the data came from and where you eventually store that data should not be embedded into the processing of the data.
You could read your input data into cell arrays or struct arrays, apply your processing and either append the output data to these or create separate output cell or struct arrays. It is totally up to you. Just don't embed the names of the source of the data into the processing of that data.
Note that it would be perfectly acceptable (and, often, even recommended) to retain the input source (e.g., input filename) as a field (if you use structs) or a cell (if you use cells) in the output data.
I hope this helps.
  3 Comments
dpb
dpb on 7 May 2017
Edited: dpb on 7 May 2017
Nitpick: "Data are, datum is"... VBG, GD&R, etc., etc., etc., ... :)
BTW, I agree with Stephen's assessment of the overview....

Sign in to comment.


Image Analyst
Image Analyst on 6 May 2017
  4 Comments
Steven Lord
Steven Lord on 6 May 2017
Generate the list of filenames using the commands in the FAQ. Pass those filenames in turn to the function I suggested you write. When the data comes out of the function to a standard variable (NOT one with a dynamic name) copy those results into a struct, cell, table, or containers.Map. Use the changeable information as the field name in the struct, the variable (column) name in the table, or the key in the containers.Map.
dpb
dpb on 6 May 2017
Be interesting on this one to see if we ever manage to break through the preconceived mindset, Steven... :)

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!