summary
Summarize cross-validation partition with stratification or grouping variable
Since R2025a
Syntax
Description
Examples
Create a cvpartition object using a grouping variable. Display a summary of the cross-validation.
Load data on tsunami occurrences, and create a table from the data. Display the first eight observations in the table.
Tbl = readtable("tsunamis.xlsx");
head(Tbl) Latitude Longitude Year Month Day Hour Minute Second ValidityCode Validity CauseCode Cause EarthquakeMagnitude Country Location MaxHeight IidaMagnitude Intensity NumDeaths DescDeaths
________ _________ ____ _____ ___ ____ ______ ______ ____________ _________________________ _________ __________________ ___________________ ___________________ __________________________ _________ _____________ _________ _________ __________
-3.8 128.3 1950 10 8 3 23 NaN 2 {'questionable tsunami' } 1 {'Earthquake' } 7.6 {'INDONESIA' } {'JAVA TRENCH, INDONESIA'} 2.8 1.5 1.5 NaN NaN
19.5 -156 1951 8 21 10 57 NaN 4 {'definite tsunami' } 1 {'Earthquake' } 6.9 {'USA' } {'HAWAII' } 3.6 1.8 NaN NaN NaN
-9.02 157.95 1951 12 22 NaN NaN NaN 2 {'questionable tsunami' } 6 {'Volcano' } NaN {'SOLOMON ISLANDS'} {'KAVACHI' } 6 2.6 NaN NaN NaN
42.15 143.85 1952 3 4 1 22 41 4 {'definite tsunami' } 1 {'Earthquake' } 8.1 {'JAPAN' } {'SE. HOKKAIDO ISLAND' } 6.5 2.7 2 33 1
19.1 -155 1952 3 17 3 58 NaN 4 {'definite tsunami' } 1 {'Earthquake' } 4.5 {'USA' } {'HAWAII' } 1 NaN NaN NaN NaN
43.1 -82.4 1952 5 6 NaN NaN NaN 1 {'very doubtful tsunami'} 9 {'Meteorological'} NaN {'USA' } {'LAKE HURON, MI' } 1.52 NaN NaN NaN NaN
52.75 159.5 1952 11 4 16 58 NaN 4 {'definite tsunami' } 1 {'Earthquake' } 9 {'RUSSIA' } {'KAMCHATKA' } 18 4.2 4 2236 3
50 156.5 1953 3 18 NaN NaN NaN 3 {'probable tsunami' } 1 {'Earthquake' } 5.8 {'RUSSIA' } {'N. KURIL ISLANDS' } 1.5 0.6 NaN NaN NaN
Create a random nonstratified partition for 5-fold cross-validation on the observations in Tbl. Ensure that observations with the same Country value are in the same fold by using the GroupingVariables name-value argument.
rng(0,"twister") % For reproducibility c = cvpartition(size(Tbl,1),KFold=5, ... GroupingVariables=Tbl.Country)
c =
Group k-fold cross validation partition
NumObservations: 162
NumTestSets: 5
TrainSize: [126 130 130 131 131]
TestSize: [36 32 32 31 31]
IsCustom: 0
IsGrouped: 1
IsStratified: 0
Properties, Methods
c is a cvpartition object. The IsGrouped property value is 1 (true), indicating that at least one grouping variable was used to create the object.
Display a summary of the cvpartition object c.
summaryTbl = summary(c)
summaryTbl=150×5 table
Set SetSize GroupLabel GroupCount PercentInSet
________ _______ ___________________ __________ ____________
"train1" 126 {'INDONESIA' } 25 19.841
"train1" 126 {'USA' } 15 11.905
"train1" 126 {'SOLOMON ISLANDS'} 10 7.9365
"train1" 126 {'JAPAN' } 19 15.079
"train1" 126 {'RUSSIA' } 19 15.079
"train1" 126 {'FIJI' } 1 0.79365
"train1" 126 {'GREENLAND' } 1 0.79365
"train1" 126 {'CHILE' } 6 4.7619
"train1" 126 {'GREECE' } 5 3.9683
"train1" 126 {'ECUADOR' } 1 0.79365
"train1" 126 {'VANUATU' } 5 3.9683
"train1" 126 {'TONGA' } 1 0.79365
"train1" 126 {'PHILIPPINES' } 7 5.5556
"train1" 126 {'CANADA' } 1 0.79365
"train1" 126 {'ATLANTIC OCEAN' } 1 0.79365
"train1" 126 {'FRANCE' } 1 0.79365
⋮
The first row in summaryTbl shows that 25 of the 126 observations in the first training set Tbl(training(c,1),:) (approximately 20%) have the Country value INDONESIA. The software ensures that the first test set Tbl(test(c,1),:) does not contain any observations with this value.
Check the Country values for the observations in the first test set.
summaryTest1 = summaryTbl(summaryTbl.Set=="test1",:)summaryTest1=6×5 table
Set SetSize GroupLabel GroupCount PercentInSet
_______ _______ ____________________ __________ ____________
"test1" 36 {'PAPUA NEW GUINEA'} 13 36.111
"test1" 36 {'MEXICO' } 8 22.222
"test1" 36 {'PERU' } 9 25
"test1" 36 {'JAPAN SEA' } 1 2.7778
"test1" 36 {'MONTSERRAT' } 4 11.111
"test1" 36 {'TURKEY' } 1 2.7778
As expected, the first test set does not contain any observations with the Country value INDONESIA.
Create a cvpartition object using a stratification variable. Display a summary of the cross-validation, and then modify the summary display.
Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The variable species lists the species for each flower.
load fisheririsCreate a random stratified partition for 3-fold cross-validation. Use the species variable as the stratification variable.
rng(0,"twister") % For reproducibility c = cvpartition(species,KFold=3)
c =
K-fold cross validation partition
NumObservations: 150
NumTestSets: 3
TrainSize: [100 100 100]
TestSize: [50 50 50]
IsCustom: 0
IsGrouped: 0
IsStratified: 1
Properties, Methods
c is a cvpartition object. The IsStratified property value is 1 (true), indicating that a stratification variable was used to create the object.
Display a summary of the cvpartition object c.
summaryTbl = summary(c)
summaryTbl=21×5 table
Set SetSize StratificationLabel StratificationCount PercentInSet
________ _______ ___________________ ___________________ ____________
"all" 150 {'setosa' } 50 33.333
"all" 150 {'versicolor'} 50 33.333
"all" 150 {'virginica' } 50 33.333
"train1" 100 {'setosa' } 34 34
"train1" 100 {'versicolor'} 33 33
"train1" 100 {'virginica' } 33 33
"test1" 50 {'setosa' } 16 32
"test1" 50 {'versicolor'} 17 34
"test1" 50 {'virginica' } 17 34
"train2" 100 {'setosa' } 33 33
"train2" 100 {'versicolor'} 33 33
"train2" 100 {'virginica' } 34 34
"test2" 50 {'setosa' } 17 34
"test2" 50 {'versicolor'} 17 34
"test2" 50 {'virginica' } 16 32
"train3" 100 {'setosa' } 33 33
⋮
The first row in summaryTbl shows that 50 of the 150 flowers in the data set (approximately 33%) are setosa flowers.
Modify the summary display to include test set information only.
testSummaryTbl = summaryTbl(contains(summaryTbl.Set,"test"),:)testSummaryTbl=9×5 table
Set SetSize StratificationLabel StratificationCount PercentInSet
_______ _______ ___________________ ___________________ ____________
"test1" 50 {'setosa' } 16 32
"test1" 50 {'versicolor'} 17 34
"test1" 50 {'virginica' } 17 34
"test2" 50 {'setosa' } 17 34
"test2" 50 {'versicolor'} 17 34
"test2" 50 {'virginica' } 16 32
"test3" 50 {'setosa' } 17 34
"test3" 50 {'versicolor'} 16 32
"test3" 50 {'virginica' } 17 34
The first row in testSummaryTbl shows that 16 of the 50 flowers in the first test set (approximately 32%) are setosa flowers.
Modify summaryTbl to include setosa information only.
setosaSummaryTbl = summaryTbl(summaryTbl.StratificationLabel=="setosa",:)setosaSummaryTbl=7×5 table
Set SetSize StratificationLabel StratificationCount PercentInSet
________ _______ ___________________ ___________________ ____________
"all" 150 {'setosa'} 50 33.333
"train1" 100 {'setosa'} 34 34
"test1" 50 {'setosa'} 16 32
"train2" 100 {'setosa'} 33 33
"test2" 50 {'setosa'} 17 34
"train3" 100 {'setosa'} 33 33
"test3" 50 {'setosa'} 17 34
The second row in setosaSummaryTbl shows that 34 of the 100 flowers in the first training set are setosa flowers.
Display summary information with a separate column for each of the three flower species.
speciesSummaryTbl = unstack(summaryTbl(:,1:4), ... "StratificationCount","StratificationLabel")
speciesSummaryTbl=7×5 table
Set SetSize setosa versicolor virginica
________ _______ ______ __________ _________
"all" 150 50 50 50
"train1" 100 34 33 33
"test1" 50 16 17 17
"train2" 100 33 33 34
"test2" 50 17 17 16
"train3" 100 33 34 33
"test3" 50 17 16 17
The second row in speciesSummaryTbl shows that of the 100 flowers in the first training set, 34 are setosa flowers, 33 are versicolor flowers, and 33 are virginica flowers.
Input Arguments
Validation partition, specified as a cvpartition object. The validation partition type of c,
c., must be Type'kfold' or
'holdout'. The IsGrouped or
IsStratified property of c must be
1 (true).
summary does not support validation partitions created using
tall arrays.
Output Arguments
Summary table describing the validation partition c, returned
as a table.
The first column
Setdescribes the specific data set for which information is displayed. Possible values include"all"(the full data set),"train1"(the first training set),"test1"(the first test set), and so on.The second column
SetSizedescribes the size of each data set listed inSet.The remaining columns depend on the properties of
c.If
c.IsStratifiedis1(true), then the remaining columns areStratificationLabel,StratificationCount, andPercentInSet.StratificationLabeldescribes the label of interest in the stratification variable.StratificationCountdescribes the number of observations in the data setSetwith the labelStratificationLabel.PercentInSetdescribes the percentage of observations in the data setSetwith the labelStratificationLabel.If
c.IsGroupedis1(true), then the number of remaining columns varies based on the number of grouping variables.For two or more grouping variables,
GroupLabel1describes the label in the first grouping variable,GroupLabel2describes the label in the second grouping variable, and so on.GroupCountdescribes the number of observations in the data setSetwith the combination of labels inGroupLabel1,GroupLabel2, and so on.PercentInSetis the percentage of observations in the data setSetwith the combination of labels inGroupLabel1,GroupLabel2, and so on.For one grouping variable, the columns are similar, with only one
GroupLabelcolumn.
Version History
Introduced in R2025a
See Also
cvpartition | test | training
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)