How should XPath be set in TableSelector for htmlImportOptions so readtable( ) can output the first three tables in an html file?
Show older comments
I like to read first three tables in an html file with calling readtable( ) once in order to reduce the html file reading time. However, the readtable( ) function from the database toolbox seems to read only one table at a time. I have tried to manipulate TableSelector right by playing around with a few XPath scripts. They either return error message or only one table. For example, the one below returns the first table, but there is no table 2 or 3.
opts = detectHtmlImportOptions(htmlfile);
opts.TableSelector = "//TABLE[position()<4]";
readtable(htmlfile, opts)
I was wondering that because the output argument of readtable( ) is a table, it can only read only table at a time.
Another related question.
% why is not lowercase 'table' right?
opts.TableSelector = "//table[1]";
readtable(htmlfile, opts)
% ans=
% 0x1 empty table
% When TABLE[1] use upper case letters, readtable( ) output the first table correctly.
opts.TableSelector = "//TABLE[1]";
readtable(htmlfile, opts)
Accepted Answer
More Answers (1)
Christopher Creutzig
on 30 Sep 2022
1 vote
readtable currently only returns a single table. There has been talk about a function returning multiple tables, but I don't know of any concrete plans. It may be worth letting support@mathworks.com know you are looking for something like that – given the time things can take from inception to release, it may not always be obvious, but user demand does influence priorities.
As for lowercase table selectors … table selectors are XPath expressions, and XPath is, well, case-sensitive. Most HTML versions/variants (maybe in practice all of them) are case agnostic, although their standards differ in what they regard as the “right” casing to use. htmlTree normalizes to uppercase. But that doesn't mean we could simply treat the XPath expression as case agnostic, as it can contain parts where case matters. I'm not sure if your question is simple curiosity or if this is actually a bump in the road to solving your problems. If the latter, please let us know more.
Nitpick: readtable does not require Database Toolbox, it is in core MATLAB.
1 Comment
Categories
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!