htmlImportOptions
Description
An HTMLImportOptions object enables you to specify how
MATLAB® imports structured, tabular data from HTML files. The object contains properties
that control the data import process, including the handling of errors and missing
data.
Creation
You can create an HTMLImportOptions object using either the
htmlImportOptions function (described here) or the detectImportOptions function:
Use
htmlImportOptionsto define the import properties based on your import requirements.Use
detectImportOptionsto detect and populate the import properties based on the contents of the HTML file specified infilename.opts = detectImportOptions(filename)
Syntax
Description
opts = htmlImportOptions creates an
HTMLImportOptions object with one variable.
opts = htmlImportOptions('NumVariables',
creates the object with the number of variables specified in
numVars)numVars.
opts = htmlImportOptions(___,
specifies additional properties for an
Name,Value)HTMLImportOptions object using one or more name-value
arguments.
Input Arguments
Number of variables, specified as a positive scalar integer.
Properties
Variable Properties
Variable names, specified as a cell array of character vectors or string array. The
VariableNames property contains the names to use when importing
variables.
If the data contains N variables, but no variable names are specified, then
the VariableNames property contains
{'Var1','Var2',...,'VarN'}.
To support invalid MATLAB identifiers as variable names, such as variable names containing spaces
and non-ASCII characters, set the value of VariableNamingRule to
'preserve'.
Example: opts.VariableNames returns the current
(detected) variable names.
Example: opts.VariableNames(3)
= {'Height'} changes the name of the third variable to Height.
Data Types: char | string | cell
Flag to preserve variable names, specified as either "modify" or
"preserve".
"modify"— Convert invalid variable names (as determined by theisvarnamefunction) to valid MATLAB identifiers."preserve"— Preserve variable names that are not valid MATLAB identifiers such as variable names that include spaces and non-ASCII characters.
Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname function). To preserve these variable names and row names, set the value of VariableNamingRule to "preserve". Variable names are not refreshed when the value of VariableNamingRule is changed from "modify" to "preserve".
Data type of variable, specified as a cell array of character vectors, or string array
containing a set of valid data type names. The VariableTypes property
designates the data types to use when importing variables.
To update the VariableTypes property, use the setvartype function.
Example: opts.VariableTypes returns the current variable data
types.
Example: opts = setvartype(opts,'Height',{'double'}) changes the
data type of the variable Height to
double.
Subset of variables to import, specified as a character vector, string scalar, cell array of character vectors, string array or an array of numeric indices.
SelectedVariableNames must be a subset of
names contained in the VariableNames property.
By default, SelectedVariableNames contains all
the variable names from the VariableNames property,
which means that all variables are imported.
Use the SelectedVariableNames property to
import only the variables of interest. Specify a subset of variables
using the SelectedVariableNames property and use readtable to import only that subset.
To support invalid MATLAB identifiers as variable names, such as variable names
containing spaces and non-ASCII characters, set the value of
VariableNamingRule to
'preserve'.
Example: opts.SelectedVariableNames = {'Height','LastName'} selects
only two variables, Height and LastName,
for the import operation.
Example: opts.SelectedVariableNames
= [1 5] selects only two variables, the first variable and
the fifth variable, for the import operation.
Example: T = readtable(filename,opts) returns
a table containing only the variables specified in the SelectedVariableNames property
of the opts object.
Data Types: uint16 | uint32 | uint64 | char | string | cell
Type specific variable import options, returned as an array
of variable import options objects. The array contains an object corresponding
to each variable specified in the VariableNames property.
Each object in the array contains properties that support the importing
of data with a specific data type.
Variable options support these data types: numeric, text, logical, datetime,
or categorical.
To query the current (or detected) options for a variable, use
the getvaropts function.
To set and customize options for a variable, use the setvaropts function.
Example: opts.VariableOptions returns a collection
of VariableImportOptions objects, one corresponding
to each variable in the data.
Example: getvaropts(opts,'Height') returns
the VariableImportOptions object for the Height variable.
Example: opts = setvaropts(opts,'Height','FillValue',0) sets
the FillValue property for the variable Height to 0.
Location Properties
Table data XPath expression, specified as a character vector or string scalar that
the reading function uses to select the output table data. You must specify
TableSelector as a valid XPath version 1.0 expression.
This table shows some example XPath expressions for selecting tables in HTML files.
| Description | TableSelector |
|---|---|
| Table containing the text "Cash dividends" | "//TABLE[contains(.,'Cash dividends')]" |
| Second table with more than 10 rows | "//TABLE[count(TR)>10][2]" |
| Table with a header cell exactly matching "Description" | "//TABLE[.//TH='Description']" |
| Table with caption containing "Report" | "//TABLE[contains(CAPTION,'Report')]" |
| First table after section header "Summary" | "H1[.='Summary']/following-sibling::TABLE[1]" |
Table with id attribute matching
'income' | "//TABLE[@id='income']" |
Example: 'TableSelector',"//TABLE[contains(.,'Cash
dividends')]"
Data location, specified as a positive scalar integer or a N-by-2 array of positive scalar integers. Specify DataRows using one of these forms.
Specify as | Description |
|---|---|
| Specify the first row that contains the data. Specifying the value using
|
| Specify the row range that contains the data. Values in the array |
| Specify multiple row ranges to read with an
A valid array of multiple row ranges must:
When specifying multiple row ranges,
use |
Example: opts.DataRows = 5 sets the DataRows property to the value [5 inf]. Read all rows of data starting from row 5 to the end-of-file.
Example: opts.DataRows = [2 6] sets the property to read rows 2 through 6.
Example: opts.DataRows = [1 3; 5 6; 8 inf] sets the property to read rows 1, 2, 3, 5, 6, and all rows between 8, and the end-of-file.
Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64
Row names location, specified as a positive scalar integer.
The RowNamesColumn property specifies the location
of the column containing the row names.
If RowNamesColumn is specified as 0, then
do not import the row names. Otherwise, import the row names from
the specified column.
Example: opts.RowNamesColumn = 2;
Data Types: single | double | uint8 | uint16 | uint32 | uint64
Row containing variable names, specified as a nonnegative integer. The
VariableNamesRow property specifies the row number where variable
names are located.
If VariableNamesRow is 0, then do not import the
variable names. Otherwise, import the variable names from the specified row.
Example: opts.VariableNamesRow = 6;
Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64
Row containing variable units, specified as a nonnegative integer.
If VariableUnitsRow is 0, then the software does
not import the variable units. Otherwise, the software imports the variable units from
the specified row.
Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64
Row containing variable descriptions, specified as a nonnegative integer.
If VariableDescriptionsRow is 0, then the
software does not import the variable descriptions. Otherwise, the software imports the
variable descriptions from the specified row.
Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64
Replacement Rules
Procedure to manage missing data, specified as one of the values in this table.
| Missing Rule | Behavior |
|---|---|
'fill' | Replace missing data with the contents of the The |
'error' | Stop importing and display an error message showing the missing record and field. |
'omitrow' | Omit rows that contain missing data. |
'omitvar' | Omit variables that contain missing data. |
Example: opts.MissingRule = 'omitrow';
Data Types: char | string
Procedure to handle empty rows in the data, specified as "skip",
"read", or "error". The importing function
interprets white space as empty.
| Empty Row Rule | Behavior |
|---|---|
"skip" | Skip the empty rows. |
"read" | Import the empty rows. The importing function parses the empty row using the values specified in VariableOptions, MissingRule, and other relevant properties. |
"error" | Display an error message and abort the import operation. |
Procedure to handle import errors, specified as one of the values in this table.
| Import Error Rule | Behavior |
|---|---|
'fill' | Replace the data where the error occurred with the contents of the
The
|
'error' | Stop importing and display an error message showing the error-causing record and field. |
'omitrow' | Omit rows where errors occur. |
'omitvar' | Omit variables where errors occur. |
Example: opts.ImportErrorRule = 'omitvar';
Data Types: char | string
Procedure to handle extra columns in the data, specified as one of the values in this table.
| Extra Columns Rule | Behavior |
|---|---|
'addvars' | To import extra columns, create new variables. If there are |
'ignore' | Ignore the extra columns of data. |
'wrap' | Wrap the extra columns of data to new records. This action does not change the number of variables. |
'error' | Display an error message and abort the import operation. |
Data Types: char | string
Procedure to handle cells with merged columns, specified as one of the values in this table.
| Import Rule | Behavior |
|---|---|
"placeleft" | Place the data in the left-most cell and fill the remaining cells
with the contents of the The |
"placeright" | Place the data in the right-most cell and fill the remaining cells
with the contents of the The |
"duplicate" | Duplicate the data in all cells. |
"omitrow" | Omit rows where merged cells occur. |
"error" | Display an error message and abort the import operation. |
Example: "MergedCellColumnRule","placeright"
Procedure to handle cells with merged rows, specified as one of the values in this table.
| Import Rule | Behavior |
|---|---|
"placetop" | Place the data in the top cell and fill the remaining cells with
the contents of the The
|
"placebottom" | Place the data in the bottom cell and fill the remaining cells with
the contents of the The
|
"duplicate" | Duplicate the data in all cells. |
"omitvar" | Omit variables where merged rows occur. |
"error" | Display an error message and abort the import operation. |
Example: "MergedCellRowRule","duplicate"
Object Functions
Examples
Create import options for an HTML file, specify the table to import, and then read the data.
Create an HTMLDocumentImportOptions object. Read from the first table containing the word "readtable" using the XPath query "//TABLE[contains(.,'readtable')]".
opts = htmlImportOptions('TableSelector',"//TABLE[contains(.,'readtable')]")
opts =
HTMLImportOptions with properties:
Replacement Properties:
MissingRule: "fill"
ImportErrorRule: "fill"
EmptyRowRule: "skip"
MergedCellColumnRule: "placeleft"
MergedCellRowRule: "placetop"
ExtraColumnsRule: "addvars"
Variable Import Properties: Set types by name using setvartype
VariableNames: "Var1"
VariableTypes: "char"
SelectedVariableNames: "Var1"
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: "preserve"
Location Properties:
TableSelector: "//TABLE[contains(.,'readtable')]"
DataRows: [1 Inf]
VariableNamesRow: 0
VariableUnitsRow: 0
VariableDescriptionsRow: 0
RowNamesColumn: 0
Read the table from the URL https://www.mathworks.com/help/matlab/text-files.html using the readtable function with the options object.
url = "https://www.mathworks.com/help/matlab/text-files.html";
T = readtable(url,opts)T=4×2 table
Var1 ExtraVar1
__________________ ______________________________
{'readtable' } {'Create table from file' }
{'writetable' } {'Write table to file' }
{'readtimetable' } {'Create timetable from file'}
{'writetimetable'} {'Write timetable to file' }
Version History
Introduced in R2021b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)