Fastest way to search files by pattern name
Show older comments
I have a main folder with a lot of subfolders (thousands). I want to load files from only specific subfolders, that can be found by specific pattern in the subfolder name. Then, in each of the subfolders, there are tens of sub-subfolders, where I also have to go to only specific ones, which again can be found by a pattern in the name. To extract needed files, I have implemented two ways of doing this via dir function: 1) one line, just using the whole path with subfolders and sub-subfolders; 2) firstly, searching for all subfolders and then searching for sub-subfolders in a for loop over the subfolders. Turns out, that the latter is much faster. Could you explain why?
%first way
files = dir(fullfile(main_folder,'*_data/*_file_to_load/file1.mat'));
%second way
subfolders = dir(fullfile(main_folder,'*_data/');
files = cell(1,numel(subfolders));
for i = 1:numel(subfolders)
files{i} = dir(fullfile(subfolders(i).folder,subfolders(i).name,'*_file_to_load/file1.mat'));
end
6 Comments
Rik
on 16 Apr 2023
I don't have the actual answer, but you are aware that you're overwriting the result every iteration?
I wouldn't trust the timing of any code where significant amounts of text is written to the console.
Anton Baranikov
on 16 Apr 2023
Rik
on 16 Apr 2023
I was indeed refering to the lack of semicolons in the original code.
I would not trust timings on this:
for n=1:100
x = rand
end
Insteading you should be timing this:
for n=1:100
x = rand; % no output to command window
end
Anton Baranikov
on 16 Apr 2023
Image Analyst
on 16 Apr 2023
@Anton Baranikov did you overlook the Answer below in the official Answer section of the page? Did you only see the comments up here at the top where people are not giving answers but are asking for clarification of the question? If you saw my Answer below, then explain why it doesn't work, or let me know that it did work.
Accepted Answer
More Answers (2)
Image Analyst
on 16 Apr 2023
Use contains to see if the pattern is in the folder or file name. Process the ones you want, and skip the ones you don't want by calling continue
if contains(thisSubFolderName, 'patternIDoNotWant')
continue % Skip to bottom of for loop
end
4 Comments
Anton Baranikov
on 17 Apr 2023
Edited: Anton Baranikov
on 17 Apr 2023
Image Analyst
on 17 Apr 2023
To process only names that meet a set of pattterns, here is one way:
for i = 1:numel(subfolders)
if contains(thisSubFolderName, 'patternIWant1') || contains(thisSubFolderName, 'patternIWant3') || contains(thisSubFolderName, 'patternIWant3')
% Process this file
end
end
or you could try using ismember
Anton Baranikov
on 17 Apr 2023
Actually, contains (and friends) work same...
if contains(thisSubFolderName, 'patternIWant1') || contains(thisSubFolderName, 'patternIWant3') || contains(thisSubFolderName, 'patternIWant3')
could be written as
if contains(thisSubFolderName, {'patternIWant1','patternIWant2','patternIWant3'})
Have to be careful with contains however, that it is the comparison wanted because it matches any substring within the searched string.
Austin Fite
on 21 Apr 2025
This is an old thread at this point but I have a file exchange utility "fsfind" that is purpose-built for this application.
files = fsfind(main_folder, 'file1.mat', 'DepthwisePattern', {'.*_data','.*_file_to_load'})
The inputs support regular expressions (see documentation for "regexp") and only subfolders that match the pattern will be searched. I use it to efficiently search very deep directory structures (10+ levels).
Categories
Find more on File Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!