How do I use MatLab to loop through many folders and extract columns of data from .txt files in those folders?

42 views (last 30 days)
I have a directory that includes 30 folders that all contain a different number of .txt files. Each folder is a different wind tunnel run and each .txt is a point in that run, and every point has several hundred rows and a certain number of columns for pressure, dB, kts, etc.
I need to build a loop that will go through each folder and each .txt to extract the data from specific columns within the .txt so I can perform calculations with them.
I also want to produce a table that compares run #'s and pressures, but once I extract the data I should be able to figure that out.
Thank you!

Answers (2)

Guillaume
Guillaume on 26 Jun 2018
Rather than handling the parsing of folders, subfolders, and text files yourself, you could let matlab handle all of that for you by using a datastore:
ds = datastore(parentdir, 'Type', 'tabulartext', 'IncludeSubfolders', true);
ds.SelectedVariableNames = {'something', 'somethingelse', 'anotherone'};
alldata = readall(ds); %assuming it all fits in memory
  3 Comments
Lucia Lang
Lucia Lang on 26 Jun 2018
I'm making progress with tabularTextDatastore, but when I try to use readall my script takes a really long time and just says that I have no memory left.
Is there any way to only read the specific columns within each of the files that I need? And once MatLab has "read" them, does that mean they're in a column form that I can do calculations with?
Guillaume
Guillaume on 26 Jun 2018
my script takes a really long time
That would be because matlab parses all the files at once. With many files, it will take a long time.
just says that I have no memory left
Then you would never have been able to read all the files with a loop
Is there any way to only read the specific columns
Yes, as I've shown in my answer, set the SelectedVariableNames properties of the tabularTextDatastore to the columns that you want. Matlab will only load these in memory.
If you are running out of memory, then load the datastore into a tall array. Perform your calculations on the tall array and at the end gather the results.

Sign in to comment.


Adam Danz
Adam Danz on 24 Jun 2018
Edited: Adam Danz on 26 Jun 2018
Below is a bare-bone example of how to...
  1. given a parent directory, list all sub-directories
  2. given a list of sub directories, find all .txt files in each sub dir
  3. given a list of txt files, open each one, read it, do something with a column of the data.
There are some assumptions in this bare-bone example. For example, it assumes that all sub directories are relevant and contain txt files. It assumes all txt files are relevant and contain data in matrix format. You'll have to step through this and adjust as needed. Regular expressions come in very handy if you need to select certain directories or txt files. The dlmread() function opens the txt file and reads the data. This may or may not work for your data depending on how it's formatted. You'll have to read the documentation carefully and you may have to choose a different method of reading the txt file (there are lots of methods). But this should get you started.
% declare your parent directory here
% The parent directory contains all of your relevant sub directories.
parentdir = 'C:\Users\blah\blah\MATLAB\blah\blah';
% get list of all sub directories (thanks to comment by @Guillaume)
allsubs = dir(fullfile(parentdir, '**'));
isdir = [allsubs.isdir] & ~ismember({allsubs.name}, {'.', '..'});
allsubdirs = fullfile({allsubs(isdir).folder}, {allsubs(isdir).name});
%Loop through all sub directories
for i = 1:length(allsubdirs)
% Get list of all files in sub dir i
dirContent = dir(allsubdirs{i});
dirFiles = {dirContent.name};
% Identify which files end in '.txt'
txtFilesIdx = ~cellfun(@isempty, regexp(dirFiles, '.txt'));
% list the txt files
txtFiles = dirFiles(txtFilesIdx);
%loop through all txt files
for j = 1:length(txtFiles)
% read data from text file j in directory i
a = dlmread(fullfile(allsubdirs{i}, txtFiles{j}));
% Do whatever you want with the data
mean(a(:,3))
end
end
  8 Comments
Adam Danz
Adam Danz on 26 Jun 2018
Given your data sample, here a rewrite of my solution. It replaces dlmread() with textread() and does some cleaning prior to reading the file.
% declare your parent directory here
% The parent directory contains all of your relevant sub directories.
parentdir = 'C:\Users\blah\blah\MATLAB\blah\blah';
% get list of all sub directories (thanks to comment by @Guillaume)
allsubs = dir(fullfile(parentdir, '**'));
isdir = [allsubs.isdir] & ~ismember({allsubs.name}, {'.', '..'});
allsubdirs = fullfile({allsubs(isdir).folder}, {allsubs(isdir).name});
%Loop through all sub directories
for i = 1:length(allsubdirs)
% Get list of all files in sub dir i
dirContent = dir(allsubdirs{i});
dirFiles = {dirContent.name};
% Identify which files end in '.txt'
txtFilesIdx = ~cellfun(@isempty, regexp(dirFiles, '.txt'));
% list the txt files
txtFiles = dirFiles(txtFilesIdx);
%loop through all txt files
for j = 1:length(txtFiles)
% 'open' the file
fid = fopen(fullfile(allsubdirs{i}, txtFiles{j}));
% Read the full file and store in cell array
txtCell = textscan(fid, '%s', 'delimiter', '\n');
% Find the first row of data by search for '1 ' (Assumes all docs have this feature!)
firstRow = find(startsWith(txtCell{:}, '1 '),1);
dataStr = txtCell{1}(10:end);
% clean up the empty spaces
dataStrClean = strtrim(regexprep(dataStr,' +',' '));
% Convert to double from string
data = cell2mat(cellfun(@str2double,cellfun(@strsplit, dataStrClean, 'UniformOutput', false),'UniformOutput', false));
% Do whatever you want with the data
mean(data(:,3))
% Close the file
fclose(fid);
end
end

Sign in to comment.

Categories

Find more on Cell Arrays in Help Center and File Exchange

Products


Release

R2016a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!