Matlab text file opts varying for similar files

Question

Stephen Devlin on 18 Jun 2018

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/406105-matlab-text-file-opts-varying-for-similar-files

Edited: dpb on 19 Jun 2018

Hi, I have 2 text files with the same amount of columns/headers, when a measurement is not completed it fills in the field with an "UND" - which can be "UND. -60001" or "UND. -62011". I have a script which usually has no problems but when it does it has been very difficult to pin down the cause, I have noticed by reading the opts that it is treating the two files differently, mfile and 2 data files attached.I don't see why the files should be treated any differently, any ideas?

The file that reads in ok has this in its 'opts'.

opts = 
    DelimitedTextImportOptions with properties:
     Format Properties:
                      Delimiter: {'\t'}
                     Whitespace: '\b '
                     LineEnding: {'\n'  '\r'  '\r\n'}
                   CommentStyle: {}
      ConsecutiveDelimitersRule: 'split'
          LeadingDelimitersRule: 'keep'
                  EmptyLineRule: 'skip'
                       Encoding: 'ISO-8859-1'
  Replacement Properties:
                  MissingRule: 'fill'
              ImportErrorRule: 'fill'
             ExtraColumnsRule: 'addvars'
   Variable Import Properties: Set types by name using setvartype
                VariableNames: {'Nozzle_number', 'Frequency_khz', 'Velocity_ms' ... and 4 more}
                VariableTypes: {'char', 'double', 'double' ... and 4 more}
        SelectedVariableNames: {'Nozzle_number', 'Frequency_khz', 'Velocity_ms' ... and 4 more}
              VariableOptions: Show all 7 VariableOptions

Whereas the file which does not load properly has this in its opts

opts = 
    DelimitedTextImportOptions with properties:
     Format Properties:
                      Delimiter: {'\t'  ' '}
                     Whitespace: '\b'
                     LineEnding: {'\n'  '\r'  '\r\n'}
                   CommentStyle: {}
      ConsecutiveDelimitersRule: 'join'
          LeadingDelimitersRule: 'ignore'
                  EmptyLineRule: 'skip'
                       Encoding: 'ISO-8859-1'
   Replacement Properties:
                  MissingRule: 'fill'
              ImportErrorRule: 'fill'
             ExtraColumnsRule: 'addvars'
   Variable Import Properties: Set types by name using setvartype
                VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
                VariableTypes: {'char', 'double', 'char' ... and 6 more}
        SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
              VariableOptions: Show all 9 VariableOptions

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 18 Jun 2018

1
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/406105-matlab-text-file-opts-varying-for-similar-files#answer_325118

Edited: dpb on 18 Jun 2018

Open in MATLAB Online

The difference is that the second file has the UND indicator in the first data line whereas the first file has a completed record. It is that record that the options routine uses to try to parse the file and so for that file there are what appear to be nine variables in the data record but there are only six column names. That mismatch creates confusion.

In this case I would suggest to not call detectImportOptions(files(jj).name) but to use a specific hand-built options object for these files or dispense with it entirely and pass everything needed as named parameter pairs in the readtable call.

ADDENDUM

After looking at your files, I think I'd go at this somewhat differently; I'd just let readtable bring in the file as cell array, do the substitution on the bad data and convert. Is it likely there's ever a file that doesn't have at least one UND in the numeric data fields?

I don't know just what your other code after reading a file does, but I'd so that portion more nearly as:

d=dir('/Users/imagexpertinc/Desktop/odds/freq_sweeps/*.txt');
for i=1:length(d)
  t=readtable(d(i).name,opts);          % table as cellstr variables
  v=cellfun(@str2num,regexprep(table2cell(t(:,3:end)),'UND.*','NaN')); % convert the UND to NaN on cell array of all variables, convert to doubles
  for j=1:5                             % put back into existing table
    t.(j+2)=v(:,j);
  end
  ...    
  % Now do what needs done with this table here before going on to next...
end

The opts table was created from an artificial RECORD.txt file that looks like a single record:

Nozzle_number  Frequency_khz  Velocity_ms  Volume_pl  Trajectory_deg  X_coordinate_mm  Y_coordinate_mm
-  4  UND. -60001  UND. -62011  UND. -60001  UND. -2011  UND. -2011

so the variables would all be recognized and imported as text; this makes the conversion performed the same on every column for every file whereas if there were a given file in which a specific variable was ok for every observation, by default that would be imported as numeric and logic would have to be written to handle it.

Unless, of course, the substituted missing value itself has significance for some reason; then would need to convert it, but your solution seems to not discern that difference, either.

9 Comments
Show 7 older commentsHide 7 older comments

Stephen Devlin on 19 Jun 2018

Ive added the row so that it is the first row detectimportoptions sees.

dpb on 19 Jun 2018

Edited: dpb on 19 Jun 2018

Hmmm...you could achieve the same effect more easily by creating and using a fixed import options object excepting using the alternate variable encoding.

Turns out that it appears (somewhat to my surprise) that that actually works cleanly; while creating the opts file on the fly like that is excessively complex, using the opts file with all variables defined to be numeric except for the first actually seems to work to replace the non-convertible fields with NaN. If this proves true on the various previous problem files, it's by far the better implementation.

Some testing shows that it does, however, take a full-blown 'opts' object to set all the myriad of options; trying to use just a minimal number of named parameters fails miserably. Possibly one could eventually figure out how to set enough parameters to make that work but I'm not absolutely positive one has sufficient control that way and it is surely far more effort that just munging a little on the self-derived object.

Sign in to comment.

Matlab text file opts varying for similar files

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

9 Comments
Show 7 older commentsHide 7 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Matlab text file opts varying for similar files

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

9 Comments Show 7 older commentsHide 7 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

9 Comments
Show 7 older commentsHide 7 older comments