How to scan pdf documents in a folder for string and give out file name
Show older comments
Hi.
I'm working on a code that scans a folder for a string and gives out the file name of the document. But my code doesn't quite work yet. I would be really greatful if someone could check it for me and suggest something. Would it be possible to also give out the page number, not just the file names?
filepath = input('Please specify the file path to scan as a string:\n'
files = dir(fullfile(filepath,'*.pdf'));
search = input('\nYour search term (as string):\n');
Filenames = {};
n = 1;
for k = 1:numel(files)
fid = fopen(fullfile(filepath,files(k).name),'r');
fgetl(fid); % skip first row
fullstr = fscanf(fid,'%s'); % NOT SURE IF THIS IS CORRECT
fclose(fid);
found = ismember(fullstr,search);
found = find(found);
%This part is supposed to find out if all characters are next to each other and not just spread around the document randomly
ct = 0; %counter
for i = 2:length(found)
if found(i) == found(i-1)+1
ct = ct + 1;
end
end
%If all characters are next to each other, the counter should be the same length as "found"
if ct == length(found)
Filenames{n,1} = files(k).name;
n = n+1;
end
end
n = 2;
if length(Filenames) == 0
Filenames{1,1} = 'Could not find any files.';
end
%Create string containing file names:
FilenamesSTR = [];
for k = 1:length(Filenames)
FilenamesSTR = [FilenamesSTR, Filenames{k,1}, '\n' ];
end
fprintf(FilenamesSTR)
Accepted Answer
More Answers (0)
Categories
Find more on Environment and Settings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!