summary
Data summary
Description
summary(
displays a summary that includes
the properties of and statistics for the input data.A
)
summary(___,
specifies options using one or more name-value arguments in addition to any of the input
combinations in the previous syntaxes. For example,
Name=Value
)summary(A,Statistics="std")
includes only the standard deviation of
the input data A
.
Examples
Summary of Matrix
Create a matrix of type double
, and display a summary of the matrix that includes the default statistics for each matrix column.
A = rand(5,3); summary(A)
A: 5x3 double NumMissing 0 0 0 Min 0.1270 0.0975 0.1576 Median 0.8147 0.5469 0.8003 Max 0.9134 0.9649 0.9706 Mean 0.6786 0.5691 0.6742 Std 0.3285 0.3921 0.3487
Display a summary that includes statistics for each matrix row.
summary(A,2)
A: 5x3 double NumMissing Min Median Max Mean Std 0 0.09754 0.15761 0.81472 0.35663 0.39786 0 0.2785 0.90579 0.97059 0.71829 0.38225 0 0.12699 0.54688 0.95717 0.54368 0.4151 0 0.48538 0.91338 0.95751 0.78542 0.26078 0 0.63236 0.80028 0.96489 0.79918 0.16627
Summary of Categorical Vector
Create a categorical vector containing three categories.
A = categorical(["A";"B";"C";"A";"C"])
A = 5x1 categorical
A
B
C
A
C
Display a summary of the vector that includes the number of occurrences of each category.
summary(A)
A: 5x1 categorical A 2 B 1 C 2 <undefined> 0
Specify Additional Statistics
Create a matrix of type double
and display a summary of the matrix that includes the sum of each matrix column in addition the default statistics.
A = rand(5,3); summary(A,Statistics=["default" "var" "sum"])
A: 5x3 double NumMissing 0 0 0 Min 0.1270 0.0975 0.1576 Median 0.8147 0.5469 0.8003 Max 0.9134 0.9649 0.9706 Mean 0.6786 0.5691 0.6742 Std 0.3285 0.3921 0.3487 Var 0.1079 0.1537 0.1216 Sum 3.3932 2.8453 3.3710
Summary of Table
Create a table with four variables of different data types.
num = rand(6,1); num2 = single(rand(6,1)); cat = categorical(["a";"a";"b";"a";"b";"c"]); dt = datetime(2016:2021,1,1)'; T = table(num,num2,cat,dt)
T=6×4 table
num num2 cat dt
_______ _______ ___ ___________
0.81472 0.2785 a 01-Jan-2016
0.90579 0.54688 a 01-Jan-2017
0.12699 0.95751 b 01-Jan-2018
0.91338 0.96489 a 01-Jan-2019
0.63236 0.15761 b 01-Jan-2020
0.09754 0.97059 c 01-Jan-2021
Display a summary of the table.
summary(T)
T: 6x4 table Variables: num: double num2: single cat: categorical (3 categories) dt: datetime Statistics for applicable variables: NumMissing Min Median Max Mean Std num 0 0.0975 0.7235 0.9134 0.5818 0.3776 num2 0 0.1576 0.7522 0.9706 0.6460 0.3708 cat 0 dt 0 01-Jan-2016 02-Jul-2018 12:00:00 01-Jan-2021 02-Jul-2018 12:00:00 16401:17:23
Display All Table Metadata
Load a table of data from the provided file.
load T
Display a summary of the table with additional table and variable metadata, including custom metadata. Omit statistics from the summary.
summary(T,Detail="high",Statistics="none")
T: 100x4 table Description: Simulated patient data Variables: Status: categorical Instrument: [1x1 cell] Age: double (Yrs) Instrument: height rod Smoker: logical Instrument: [1x1 cell] BloodPressure: 2-column double (mm Hg) Description: Systolic/Diastolic Instrument: bloodp pressure cuff
The summary includes the metadata properties that describe the table and its variables. Access the properties.
T.Properties
ans = TableProperties with properties: Description: 'Simulated patient data' UserData: [] DimensionNames: {'Row' 'Variables'} VariableNames: {'Status' 'Age' 'Smoker' 'BloodPressure'} VariableTypes: ["categorical" "double" "logical" "double"] VariableDescriptions: {'' '' '' 'Systolic/Diastolic'} VariableUnits: {'' 'Yrs' '' 'mm Hg'} VariableContinuity: [] RowNames: {100x1 cell} Custom Properties (access using t.Properties.CustomProperties.<name>): Instrument: {'' 'height rod' '' 'bloodp pressure cuff'}
Return Summary as Structure
Create a timetable.
MeasurementTime = datetime(["2024-01-01";"2024-02-01";"2024-03-01"]); Temp = [37;39;42]; TT = timetable(MeasurementTime,Temp)
TT=3×1 timetable
MeasurementTime Temp
_______________ ____
01-Jan-2024 37
01-Feb-2024 39
01-Mar-2024 42
Return a summary of the timetable.
s = summary(TT)
s = struct with fields:
MeasurementTime: [1x1 struct]
Temp: [1x1 struct]
The MeasurementTime
field of the structure contains a summary of the row times.
s.MeasurementTime
ans = struct with fields:
Size: [3 1]
Type: 'datetime'
TimeZone: ''
SampleRate: NaN
StartTime: 01-Jan-2024
NumMissing: 0
Min: 01-Jan-2024
Median: 01-Feb-2024
Max: 01-Mar-2024
Mean: 31-Jan-2024 08:00:00
Std: 720:07:59
TimeStep: 1mo
The Temp
field of the structure contains a summary of the Temp
variable. Access the median.
s.Temp.Median
ans = 39
Input Arguments
A
— Input data
array | table | timetable
Input data, specified as an array, table, or timetable.
dim
— Operating dimension for array
positive integer scalar | vector of positive integers | "all"
Operating dimension for array, specified as a positive integer scalar, a vector of
positive integers, or "all"
. If you do not specify
dim
, then the default is the first array dimension whose size does
not equal 1.
If the input array is categorical
, then dim
must be a scalar.
Consider an input matrix, A
:
summary(A,1)
displays statistics for each column ofA
.summary(A,2)
displays statistics for each row ofA
.
Specifying dim
is not supported when the input data is a table or
timetable.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: summary(A,Statistics="allstats")
Detail
— Level of detail to display
"low"
(default) | "high"
Level of detail to display for table or timetable input data, specified as one of these values:
"low"
— Provide a concise summary. Display the variable name, type, unit, and description for each table variable."high"
— Provide a verbose summary. Display all table and variable metadata in addition to details in"low"
. Forcategorical
variables,"high"
also displays the categories and counts.
summary
accesses metadata that describes a table and its
variables through the Properties
property of the table.
The Detail
name-value argument does not configure the summary
when you return the summary as a scalar structure. The summary structure always
includes all table and variable metadata.
Example: summary(A,Detail="high")
displays table and variable
metadata in addition to the variable names, types, units, and
descriptions.
Statistics
— Statistics to compute
"default"
(default) | "nummissing"
| "min"
| "allstats"
| "none"
| function handle | string array | cell array | ...
Statistics to compute, specified as one or more of the following values. For table and timetable data, the specified statistics are computed for all applicable variables, including row times for timetable data.
For the "default"
value, the statistics to compute depend on
the data type of the input data.
Data Type | Statistics to Compute |
---|---|
|
|
Integer |
|
logical |
|
Non-ordinal categorical |
|
Ordinal categorical |
|
| "nummissing" |
To compute a different set of statistics, you can specify one or more of these values. To specify multiple statistics, list the options in a string array or cell array.
Statistic | Description |
---|---|
"nummissing" | Number of missing elements |
"min" | Minimum |
"median" | Median |
"max" | Maximum |
"q1" | First quartile or 25th percentile |
"q3" | Third quartile or 75th percentile |
"mean" | Mean |
"std" | Standard deviation |
"var" | Variance |
"mode" | Mode |
"range" | Maximum minus minimum |
"sum" | Sum |
"numunique" | Number of distinct nonmissing elements |
"nnz" | Number of nonzero and nonmissing elements |
"counts" | Number of occurrences of each category |
"allstats" | All statistics previously listed |
"none" | No statistics |
You can also specify Statistics
as a function handle that must:
Accept one input data argument.
Return one output that is scalar or has the same size as the input data in all dimensions except for a size of 1 along the first dimension.
For table or timetable input data, operate along each variable separately.
When summary
computes a statistic:
If the function encounters an error, the summary does not include that statistic.
If the function encounters missing values, it omits those values from the computation, with the exception of the
"nummissing"
statistic. To include missing values, use a function handle, such as@sum
instead of"sum"
.
Example: summary(A,Statistics=["mean" "var" "mode"])
computes
the mean, variance, and mode.
Example: summary(A,Statistics={"default",myFun1})
computes the
result of myFun1
in addition to the default
statistics.
DataVariables
— Table or timetable variables to summarize
scalar | vector | cell array | pattern | function handle | table vartype
subscript
Table or timetable variables to summarize, specified as one of the values in this table.
Variables in the table or timetable not specified by the
DataVariables
name-value argument are not included in the
summary.
Indexing Scheme | Values to Specify | Examples |
---|---|---|
Variable names |
|
|
Variable index |
|
|
Function handle |
|
|
Variable type |
|
|
Example: summary(A,DataVariables=["Var1" "Var2" "Var4"])
displays a summary of Var1
, Var2
, and
Var4
.
Output Arguments
s
— Summary of input data
scalar structure
Summary of input data, returned as a scalar structure.
If the input data is a table or timetable, then each field in
s
contains a summary of one of the variables. IfA
is a timetable,s
also contains a field with the summary of the row times.If the input data is an array, then each field in
s
contains a property or statistic.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
The
summary
function supports tall arrays with the following usage
notes and limitations:
Only tall tables and tall timetables are supported.
Name-value arguments
Detail
,Statistics
, andDataVariables
are not supported.Some calculations in the summary might be slow to complete with large data sets, such as the median and standard deviation, and are not included.
For more information, see Tall Arrays.
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
Distributed Arrays
Partition large arrays across the combined memory of your cluster using Parallel Computing Toolbox™.
Usage notes and limitations:
Only distributed tables are supported.
Name-value arguments
Detail
,Statistics
, andDataVariables
are not supported.Some calculations in the summary might be slow to complete with large data sets, such as the median and standard deviation, and are not included.
For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).
Version History
Introduced in R2013bR2024b: Summarize array data and configure summary contents
You can now summarize array data, including numeric, logical
,
datetime
, duration
, and
calendarDuration
types. Previously, the function supported array data
only when it was categorical
.
You can configure the summary contents using one or more name-value arguments:
Statistics
— Specify which statistics to compute.Detail
— For table or timetable data only, specify the level of table metadata detail to display in the summary.DataVariables
— For table or timetable data only, specify the variables to summarize.
R2024b: Categorical summary includes number of undefined elements
When you display a summary of a categorical array, the summary now always includes the number of undefined elements. Previously, the summary omitted the number of undefined elements if the array contained no missing values.
If you want to omit the number of undefined elements from the summary, specify the
Statistics
name-value argument. For example,
summary(A,Statistics="counts")
displays only the number of elements in
each category.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)