Access Data Using Categorical Arrays
Select Data By Category
Selecting data based on its values is often useful. This type of data selection can involve creating a logical vector based on values in one variable, and then using that logical vector to select a subset of values in other variables. You can create a logical vector for selecting data by finding values in a numeric array that fall within a certain range. Additionally, you can create the logical vector by finding specific discrete values. When using categorical arrays, you can easily:
Select elements from particular categories. For categorical arrays, use the logical operators
==
or~=
to select data that is in, or not in, a particular category. To select data in a particular group of categories, use theismember
function.For ordinal categorical arrays, use inequalities
>
,>=
,<
, or<=
to find data in categories above or below a particular category.Delete data that is in a particular category. Use logical operators to include or exclude data from particular categories.
Find elements that are not in a defined category. Categorical arrays indicate which elements do not belong to a defined category by
<undefined>
. Use theisundefined
function to find observations without a defined value.
Common Ways to Access Data Using Categorical Arrays
This example shows how to index and search using categorical arrays. You can access data using categorical arrays stored within a table in a similar manner.
Load Sample Data
Load data about 100 patients from the sample patients.mat
MAT-file.
load patients.mat
whos
Name Size Bytes Class Attributes Age 100x1 800 double Diastolic 100x1 800 double Gender 100x1 13012 cell Height 100x1 800 double LastName 100x1 13216 cell Location 100x1 15808 cell SelfAssessedHealthStatus 100x1 13140 cell Smoker 100x1 100 logical Systolic 100x1 800 double Weight 100x1 800 double
Create Categorical Arrays
The arrays Location
and SelfAssessedHealthStatus
contain data that belong in categories. Each array contains text taken from a small set of unique values (indicating three locations and four health statuses respectively). To convert Location
and SelfAssessedHealthStatus
to categorical arrays, use the categorical
function. On the other hand, the array LastName
has a list of last names that are not categories. So, convert LastName
to a string array using the string
function.
Location = categorical(Location); SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus); LastName = string(LastName);
Search for Members of a Single Category
For categorical arrays, you can use the logical operators ==
and ~=
to find the data that is in, or not in, a particular category.
Determine if there are any patients observed at the location, Rampart General Hospital
.
any(Location == "Rampart General Hospital")
ans = logical
0
There are no patients observed at Rampart General Hospital
.
Search for Members of a Group of Categories
You can use ismember
to find data in a particular group of categories. For example, call ismember
using Location
as input data. Create a logical vector that identifies patients observed at either County General Hospital
or VA Hospital
.
Location
Location = 100x1 categorical
County General Hospital
VA Hospital
St. Mary's Medical Center
VA Hospital
County General Hospital
St. Mary's Medical Center
VA Hospital
VA Hospital
St. Mary's Medical Center
County General Hospital
County General Hospital
St. Mary's Medical Center
VA Hospital
VA Hospital
St. Mary's Medical Center
VA Hospital
St. Mary's Medical Center
VA Hospital
County General Hospital
County General Hospital
VA Hospital
VA Hospital
VA Hospital
County General Hospital
County General Hospital
VA Hospital
VA Hospital
County General Hospital
County General Hospital
County General Hospital
⋮
VA_CountyGenIndex = ... ismember(Location,["County General Hospital","VA Hospital"])
VA_CountyGenIndex = 100x1 logical array
1
1
0
1
1
0
1
1
0
1
⋮
VA_CountyGenIndex
is a 100-by-1 logical array containing logical true
(1
) for each element in Location
that is a member of the categories County General Hospital
or VA Hospital
. The output, VA_CountyGenIndex
contains 76 nonzero elements.
Use the logical vector, VA_CountyGenIndex
to select the LastName
of the patients observed at either County General Hospital
or VA Hospital
.
VA_CountyGenPatients = LastName(VA_CountyGenIndex)
VA_CountyGenPatients = 76x1 string
"Smith"
"Johnson"
"Jones"
"Brown"
"Miller"
"Wilson"
"Taylor"
"Anderson"
"Jackson"
"White"
"Martin"
"Garcia"
"Martinez"
"Robinson"
"Clark"
"Rodriguez"
"Lewis"
"Lee"
"Walker"
"Hall"
"Allen"
"Young"
"Hernandez"
"King"
"Wright"
"Lopez"
"Green"
"Adams"
"Baker"
"Mitchell"
⋮
Select Elements in a Particular Category to Plot
Use the summary
function to print a summary containing the category names and the number of elements in each category.
summary(Location)
Location: 100x1 categorical County General Hospital 39 St. Mary's Medical Center 24 VA Hospital 37 <undefined> 0
Location
is a 100-by-1 categorical array with three categories. County General Hospital
occurs in 39 elements, St. Mary's Medical Center
in 24 elements, and VA Hospital
in 37 elements.
Use the summary
function to print a summary of SelfAssessedHealthStatus
.
summary(SelfAssessedHealthStatus)
SelfAssessedHealthStatus: 100x1 categorical Excellent 34 Fair 15 Good 40 Poor 11 <undefined> 0
SelfAssessedHealthStatus
is a 100-by-1 categorical array with four categories.
Use logical operator ==
to access the ages of patients who assess their own health status as Good
. Then plot a histogram of this data.
figure() histogram(Age(SelfAssessedHealthStatus == "Good")) title("Ages of Patients with Good Health Status")
histogram(Age(SelfAssessedHealthStatus == "Good"))
plots the age data for the 40 patients who reported Good
as their health status.
Delete Data from a Particular Category
You can use logical operators to include or exclude data from particular categories. Delete all patients observed at VA Hospital
from the workspace variables, Age
and Location
.
Age = Age(Location ~= "VA Hospital"); Location = Location(Location ~= "VA Hospital")
Location = 63x1 categorical
County General Hospital
St. Mary's Medical Center
County General Hospital
St. Mary's Medical Center
St. Mary's Medical Center
County General Hospital
County General Hospital
St. Mary's Medical Center
St. Mary's Medical Center
St. Mary's Medical Center
County General Hospital
County General Hospital
County General Hospital
County General Hospital
County General Hospital
County General Hospital
County General Hospital
St. Mary's Medical Center
St. Mary's Medical Center
County General Hospital
St. Mary's Medical Center
St. Mary's Medical Center
St. Mary's Medical Center
County General Hospital
County General Hospital
County General Hospital
County General Hospital
County General Hospital
County General Hospital
St. Mary's Medical Center
⋮
Now, Age
is a 63-by-1 numeric array, and Location
is a 63-by-1 categorical array.
List the categories of Location
, as well as the number of elements in each category.
summary(Location)
Location: 63x1 categorical County General Hospital 39 St. Mary's Medical Center 24 VA Hospital 0 <undefined> 0
The patients observed at VA Hospital
are deleted from Location
, but VA Hospital
is still a category.
Use the removecats
function to remove VA Hospital
from the categories of Location
.
Location = removecats(Location,"VA Hospital");
Verify that the category, VA Hospital
, was removed.
categories(Location)
ans = 2x1 cell
{'County General Hospital' }
{'St. Mary's Medical Center'}
Location
is a 63-by-1 categorical array that has two categories.
Delete Element
You can delete elements by indexing. For example, you can remove the first element of Location
by selecting the rest of the elements with Location(2:end)
. However, an easier way to delete elements is to use []
.
Location(1) = []; summary(Location)
Location: 62x1 categorical County General Hospital 38 St. Mary's Medical Center 24 <undefined> 0
Location
is a 62-by-1 categorical array that has two categories. Deleting the first element has no effect on other elements from the same category and does not delete the category itself.
Test for Undefined Elements
Remove the category County General Hospital
from Location
.
Location = removecats(Location,"County General Hospital");
Display the first eight elements of the categorical array, Location
.
Location(1:8)
ans = 8x1 categorical
St. Mary's Medical Center
<undefined>
St. Mary's Medical Center
St. Mary's Medical Center
<undefined>
<undefined>
St. Mary's Medical Center
St. Mary's Medical Center
After removing the category, County General Hospital
, elements that previously belonged to that category no longer belong to any category defined for Location
. The categorical elements that do not belong to any category are undefined, and display <undefined>
as their values.
Use the function isundefined
to find elements of a categorical array that do not belong to any category.
undefinedIndex = isundefined(Location);
undefinedIndex
is a 62-by-1 categorical array containing logical true
(1
) for all undefined elements in Location
.
Set Undefined Elements
Use the summary
function to print the number of undefined elements in Location
. Then display the first five elements of Location
.
summary(Location)
Location: 62x1 categorical St. Mary's Medical Center 24 <undefined> 38
Location(1:5)
ans = 5x1 categorical
St. Mary's Medical Center
<undefined>
St. Mary's Medical Center
St. Mary's Medical Center
<undefined>
The first element of Location
belongs to the category, St. Mary's Medical Center
. Set the first element to be an undefined value so that it no longer belongs to any category. The recommended way is to use the missing
function to create undefined values. Another way is to assign ''
or ""
to elements of the array. When you assign such values to elements of a categorical array, it converts them to undefined values.
Location(1) = missing;
Location(3) = '';
Location(1:5)
ans = 5x1 categorical
<undefined>
<undefined>
<undefined>
St. Mary's Medical Center
<undefined>
The summary
function shows that these assignments increased the number of undefined elements.
summary(Location)
Location: 62x1 categorical St. Mary's Medical Center 22 <undefined> 40
You can make selected elements undefined
without removing a category or changing the categories of other elements. Set undefined elements to indicate elements with values that are unknown.
Preallocate Categorical Arrays with Undefined Elements
You can use undefined elements to preallocate the size of a categorical array for better performance. Create a categorical array that has elements with known locations only.
definedIndex = ~isundefined(Location); newLocation = Location(definedIndex); summary(newLocation)
newLocation: 22x1 categorical St. Mary's Medical Center 22 <undefined> 0
Expand the size of newLocation
so that it is a 200-by-1 categorical array. Set the last new element to be an undefined element. All of the other new elements are also assigned undefined values. The 22 original elements keep the values that they had.
newLocation(200) = missing; summary(newLocation)
newLocation: 200x1 categorical St. Mary's Medical Center 22 <undefined> 178
newLocation
has room for values you plan to store in the array later.
See Also
categorical
| categories
| summary
| any
| histogram
| removecats
| isundefined
Related Examples
- Create Categorical Arrays
- Convert Text in Table Variables to Categorical
- Plot Categorical Data
- Compare Categorical Array Elements
- Work with Protected Categorical Arrays