parquetinfo
Get information about Parquet file
Description
ParquetInfo
objects contain information about a Parquet file,
such as: file size, variable names and types, encoding, and compression schemes. To get
information about a Parquet file, create the ParquetInfo
object using the
parquetinfo
function.
Creation
Description
Input Arguments
filename
— Name of Parquet file
character vector | string scalar
Name of Parquet file, specified as a character vector or string scalar.
ParquetInfo
works with Parquet 1.0 or Parquet 2.0 files.
Depending on the location of the file, filename
can take on one of
these forms.
Location | Form | ||||||||
---|---|---|---|---|---|---|---|---|---|
Current folder or folder on the MATLAB® path | Specify the name of the file in
Example:
| ||||||||
File in a folder | If the file is not in the current folder or in a folder on the MATLAB path, then specify the full or relative path name. Example:
Example:
| ||||||||
Internet URL | If the file is specified as an internet uniform resource locator (URL),
then Example:
| ||||||||
Remote Location | If the file is stored at a remote location, then
Based on the remote location,
For more information, see Work with Remote Data. Example:
|
Data Types: char
| string
Properties
Filename
— Absolute path to Parquet file
string scalar
This property is read-only.
Absolute path to Parquet file, specified as a string scalar.
Data Types: string
FileSize
— File size in bytes
double
This property is read-only.
File size in bytes, specified as double
.
Data Types: double
NumRowGroups
— Number of row groups
double
This property is read-only.
Number of row groups, specified as a double
.
Data Types: double
RowGroupHeights
— Number of rows in each row group
double
This property is read-only.
Number of rows in each row group, specified as a double
.
Data Types: double
VariableNames
— Variable names
string array
This property is read-only.
Variable names, specified as a string array. If the Parquet file contains
N
variables, then VariableNames
is an array of
size 1
-by-N
containing the names of the
variables.
Data Types: string
VariableTypes
— Variable data types
string array
This property is read-only.
Variable data types, specified as a string array. If the Parquet file contains
N
variables, then VariableTypes
is an array of
size 1
-by-N
containing datatype names for each
variable. Each element in the array is the name of the MATLAB datatype to which the corresponding variable in the Parquet file
maps.
Data Types: string
VariableCompression
— Variable compression algorithm
string array
This property is read-only.
Variable compression algorithm, specified as a string array. If the Parquet file
contains N
variables, then VariableCompression
is
an array of size 1
-by-N
containing compression
algorithm names. Each element in the array corresponds to the compression algorithm used
to compress that variable in the Parquet file. See parquetwrite
for a list of
supported compression algorithms.
Data Types: string
VariableEncoding
— Variable encoding
string array
This property is read-only.
Variable encoding, specified as a string array. If the Parquet file contains
N
variables, then VariableEncoding
is an array
of size 1
-by-N
containing encoding scheme names.
Each element in the array corresponds to the encoding scheme used to encode that
variable in the Parquet file. See parquetwrite
for a list of
supported encodings.
Data Types: string
Version
— Parquet version
"1.0"
| "2.0"
This property is read-only.
Parquet version, specified as either "1.0"
or
"2.0"
.
Data Types: string
Examples
Get Information About Parquet File
Use the parquetinfo
function to create a ParquetInfo
object containing information about the file.
info = parquetinfo('outages.parquet')
info = ParquetInfo with properties: Filename: "/mathworks/devel/bat/filer/batfs2561-0/Bdoc24b.2679053/build/runnable/matlab/toolbox/matlab/demos/outages.parquet" FileSize: 44202 NumRowGroups: 1 RowGroupHeights: 1468 VariableNames: ["Region" "OutageTime" "Loss" "Customers" "RestorationTime" "Cause"] VariableTypes: ["string" "datetime" "double" "double" "datetime" "string"] VariableCompression: ["snappy" "snappy" "snappy" "snappy" "snappy" "snappy"] VariableEncoding: ["plain" "plain" "plain" "plain" "plain" "plain"] Version: "2.0"
Display the name, type, and compression scheme for the third variable in the file.
disp([info.VariableNames(3) info.VariableTypes(3) info.VariableCompression(3)])
"Loss" "double" "snappy"
Extended Capabilities
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
Version History
Introduced in R2019aR2022b: Use function in thread-based environments
This function supports thread-based environments.
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)