Index for subfolders without *.pdf files

Hi guys. I have one folder which contains several 100 subfolders. I need to index for subfolder with does not contain a pdf file, and locate these folders. Hwo can I do this? Thanks a lot

 Accepted Answer

Stephen23
Stephen23 on 11 Oct 2019
Edited: Stephen23 on 11 Oct 2019
Simpler and more robust:
D = 'path to the main folder';
S = dir(fullfile(D,'*'));
N = setdiff({S([S.isdir]).name},{'.','..'});
F = @(s)isempty(dir(fullfile(D,s,'*.pdf')));
X = cellfun(F,N)
It returns logical indices, simply use FIND for subscript indices.
The names of the folders without .PDF files:
N(X)

5 Comments

Hi Stephen Cobeldick, thanks a lot. Is it possible to look for *pdf in the third sub-level only? For instance in Mainfolder -> subfolder -> subsubfolder -> subsubsubfolder even if there is several subfolders and subsubfolder? I hope this makes sence?
Stephen23
Stephen23 on 11 Oct 2019
Edited: Stephen23 on 11 Oct 2019
" Is it possible to look for *pdf in the third sub-level only?"
Of course. Most likely it would be easier to implement using nested loops.
How many subfolders are there? (constant or variable?)
How many subsubfolders per subfolder? (constant or variable?)
How many subsubsubfolders per subsubfolder? (constant or variable?)
What do you expect the output to look like (please given an exact example).
Hi Stephen Cobeldick, I really appreciate you help.
The data structure is as shown below (and also attached as a .zip file).
There will always be 3 Subfolders. The number of SubSubFolders and SubSubSubFolders is variable. So what I like to do is:
1) Always start from the MainFolder
2) Search for pdf files which always starts with 'Target' in all SubSubSubFolders. It should NOT search for pdf files in the SubSubSubSubFolders
3) The final output should be a list of those SubSubSubFolders that do NOT contain a 'Target' pdf file. In this case:
  • MainFolder\Subfolder1\SubSubFolder1_1_1\SubSubSubFolder1_1_3
  • MainFolder\Subfolder3\SubSubFolder3_1_1\SubSubSubFolder3_1_1
  • MainFolder\Subfolder3\SubSubFolder3_1_2\SubSubSubFolder3_1_2
DataStructure.png
This should get you started, please adjust it to fit your exact structure and needs:
D = './MainFolder'; % path to the main folder.
out = {};
ds1 = dir(fullfile(D,'*'));
dn1 = setdiff({ds1([ds1.isdir]).name},{'.','..'});
for k1 = 1:numel(dn1) % loop over subfolders.
ds2 = dir(fullfile(D,dn1{k1},'*'));
dn2 = setdiff({ds2([ds2.isdir]).name},{'.','..'});
for k2 = 1:numel(dn2) % loop over subsubfolders.
ds3 = dir(fullfile(D,dn1{k1},dn2{k2},'*'));
dn3 = setdiff({ds3([ds3.isdir]).name},{'.','..'});
for k3 = 1:numel(dn3) % loop over subsubsubfolders.
tmp = fullfile(D,dn1{k1},dn2{k2},dn3{k3});
fnm = dir(fullfile(tmp,'Target*.pdf'));
if isempty(fnm)
out{end+1} = tmp;
end
end
end
end
Hi Stephen Cobeldick, your code works perfectly, Thanks a lot, you saved my a lot of time.

Sign in to comment.

More Answers (1)

This should work.
% specify path to the source folder (in your case the one which contains 100 subfolders)
rootFolderPath = './RootFolder';
% get all subfolders and files (if any) inside root folder
allFolders = dir(rootFolderPath);
% initialise an empty variable to store indices of folders without PDF file
foldersWithoutPDF = [];
% for each element of allFolders
for i = 3:length(allFolders)
% check whether it is a folder and it does not contain any pdf file
if ( isdir([rootFolderPath filesep allFolders(i).name]) && ...
isempty(dir([rootFolderPath filesep allFolders(i).name filesep '*.pdf'])) )
foldersWithoutPDF = [foldersWithoutPDF ; i-2];
end
end
The variable "foldersWithoutPDF" should contain the indices of all subfolders without PDF file.

1 Comment

Note that this line is fragile/buggy:
for i = 3:length(allFolders)
because its author incorrectly assumed that the first two elements of allFolders are always the folder shortcuts '.' and '..'. In fact:
  1. there is no guarantee that any particular OS will return those shortcuts.
  2. there is no guarantee that they will be returned as the first two names. In fact it is trivial to create some file/folder names which demonstrate that they are not always the first two returned names:
>> fclose(fopen('+test.txt','wt'));
>> fclose(fopen('-test.txt','wt'));
>> fclose(fopen('@test.txt','wt'));
>> S = dir('*');
>> S.name
ans = +test.txt
ans = .test.txt
ans = .
ans = ..
ans = @test.txt
Also note that fullfile is recommended for creating file paths, rather than string concatenation.

Sign in to comment.

Categories

Products

Release

R2019a

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!