How to extract text from .json files and combine them?
20 views (last 30 days)
Show older comments
Hello everyone,
I've got some questions and any inputs would be greatly appreciated. I have bunch of .json files, say 1000. To read each files I run the following code
fname = 'C:\Users\...\d90f3c62681e.json';
val = jsondecode(fileread(fname));
the output is as follows. For each file the paper_id, the size of abstract, and the size of body_text changes. I am interested in the text data in the "abstract" and the "body text". How can I extract text file in the abstract and body_text, and combine all these .json files into one file?
val =
struct with fields:
paper_id: 'd90f3c62681e'
metadata: [1×1 struct]
abstract: [1×1 struct]
body_text: [4×1 struct]
bib_entries: [1×1 struct]
ref_entries: [1×1 struct]
back_matter: []
val.abstract =
struct with fields:
text: '300 words)
cite_spans: []
ref_spans: []
section: 'Abstract'
val.body_text =
4×1 struct array with fields:
text
cite_spans
ref_spans
section
4 Comments
Walter Roberson
on 28 Mar 2020
Which release are you using? When I try in R2020a, I get
paper_id: '0a43046c154d0e521a6c425df215d90f3c62681e'
>> val.abstract
struct with fields:
text: '300 words) 33 Quantification of aerosolized influenza virus [and a bunch more]
Accepted Answer
Ameer Hamza
on 31 Mar 2020
Edited: Ameer Hamza
on 31 Mar 2020
As I answered in the comment on your other question, the following code will create a struct by combining the fields from individual files. It will then create a combined JSON file
files = dir('JSON files/*.json');
s = struct('abstract', [], 'body_text', []);
for i=1:numel(files)
filename = fullfile(files(i).folder, files(i).name);
data = jsondecode(fileread(filename));
if ~isempty(data.abstract)
s.abstract = [s.abstract; cell2struct({data.abstract.text}, 'text', 1)];
end
if ~isempty(data.body_text)
s.body_text = [s.body_text; cell2struct({data.body_text.text}, 'text', 1)];
end
end
str = jsonencode(s);
f = fopen('filename.json', 'w');
fprintf(f, '%s', str);
fclose(f);
More Answers (1)
Mohammad Sami
on 30 Mar 2020
You can import your data into cell arrays
filelist = {};
vals = cell(length(filelist),1);
haveabstract = false(length(filelist),1);
havebody = false(length(filelist),1);
data = cell(length(filelist),3);
% first col paper_id, second_col abstract, third col body
for i=1:length(filelist)
vals{i} = jsondecode(fileread(filelist{i}));
haveabstract(i) = ~isempty(vals{i}.abstract);
havebody(i) = ~isempty(vals{i}.body_text);
data{i,1} = vals{i}.paper_id;
if haveabstract(i)
data{i,2} = vals{i}.abstract;
end
if havebody(i)
data{i,3} = vals{i}.body_text
end
end
See Also
Categories
Find more on JSON Format in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!