Help with readtable opts

18 views (last 30 days)
John Doe
John Doe on 31 Oct 2018
Commented: dpb on 6 Nov 2018
I have some data (example attached).
I am using a readtable for loop to combine multiple CSV files in to one dataset which I can process. This works great.
d = uigetdir; %Select folder containing logger data
filePattern = fullfile(d, '*.csv'); %naming convention of file.
filedir = dir(filePattern); %directory of all files with .csv extension
Q=numel(filedir);
x = cell(1, Q);
%numel(filedir) returns number of items in filedr (how many files will be processed
for k = 1 : Q
baseFileName = filedir(k).name; %calls file name from position k in firdir
fullFileName = fullfile(d, baseFileName); %concatenate file directory and baseFileName to give full absolute path
x{k} = readtable(fullFileName); %imports data from csv
fprintf('read file %s\n', fullFileName); %show which file is being processed in command window
end
bigtable = vertcat(x{:}); %Add individual data sets together to make one large test data set
I have tried to use the following code to specify the units row and where the data starts.
%before for loop
opts.VariableNamesLine = 1;
opts.VariableUnitsLine = 2;
opts.DataLine = 4;
%inside for loop
x{k} = readtable(fullFileName,opts)
The DataLine seems to work and it ignores the first 3 lines. However I longer have variables and units.
The other problem I have is that readtable makes everything a string rather than a double. (This is my biggest problem as I can't process the data, please help!)
Thanks

Accepted Answer

dpb
dpb on 31 Oct 2018
Edited: dpb on 31 Oct 2018
Don't try to create the options object from scratch--that rarely, if ever, works because there's a whole lot of other stuff besides just the one or two items that is contained within it and trying to get all the defaulted elements fully consistent with the few you specify is a Herculean task (and best not left to mere mortals :) ).
Use something like:
% query first file for import options
opts=detectImportOptions(fullfile(d, filedir(1).name));
Then use this import options object for each file inside the loop.
I tried it on your sample and it correctly identified the variable type on its own...
>> opts=detectImportOptions('data2.csv')
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'time', 'Var1', 'Var2' ... and 8 more}
VariableTypes: {'datetime', 'double', 'double' ... and 8 more}
SelectedVariableNames: {'time', 'Var1', 'Var2' ... and 8 more}
VariableOptions: Show all 11 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLine: 4
VariableNamesLine: 1
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
>> t=readtable('data2.csv',opts);
Warning: The DATETIME data was created using format 'MM/dd/uuuu HH:mm' but also matched 'dd/MM/uuuu HH:mm'.
To avoid ambiguity, supply a datetime format using SETVAROPTS, e.g.
opts = setvaropts(opts,varname,'InputFormat','MM/dd/uuuu HH:mm');
> In matlab.io.internal.text.TableParser/readData (line 75)
In matlab.io.text.TextImportOptions/readtable (line 223)
>> t(1:5,:)
ans =
5×11 table
time Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10
____________________ __________ _______ ____ ____ ____ ____ ____ ____ ____ _____
05-Jun-2018 12:00:00 1561 -200.25 2.5 2.4 6.5 7 0.1 5.6 6.8 6.1
05-Jun-2018 12:00:00 1.5613e+05 -200.25 2.5 2.4 6.5 7 0.1 5.6 6.8 6.1
05-Jun-2018 12:00:00 15661 -200.25 2.5 2.4 6.5 7 0.1 5.6 6.7 6.1
05-Jun-2018 12:00:00 1656 -200.25 2.5 2.4 6.5 7 0.1 5.6 6.8 6.1
05-Jun-2018 12:00:00 1661 -200.25 2.5 2.4 6.5 7 0.1 5.6 6.8 6.1
>>
NB: the warning on the datetime conversion; this will probably go away on its own when you use real data that will contain unambiguous day/month values, but you can set the actual format explicitly and be sure.
You must do this with the setvartype function; see documentation for examples, syntax.
It appears from the data that perhaps it would be a better choice to make Var1 and perhaps Var6 categorical instead of double; you'll know their meaning better than we.
  6 Comments
John Doe
John Doe on 6 Nov 2018
Now that I have updated to R2018a the issue has resolved its self.
opts=detectImportOptions(fullfile(d, filedir(1).name));
opts.VariableUnitsLine = 2;
opts.VariableNamesLine = 1;
opts.DataLine = 3;
Gives the output below when called from cmd window. Rather than reverting back to 0 it stays as 2.
Location Properties:
DataLines: 3
VariableNamesLine: 1
RowNamesColumn: 0
VariableUnitsLine: 2
VariableDescriptionsLine: 0
To display a preview of the table, use preview
dpb
dpb on 6 Nov 2018
Are the Units shown via preview or if write
t(1:4,:)
to show first few rows now/yet?

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!