Main Content

categorical

Array that contains values assigned to categories

Description

categorical is a data type that assigns values to a finite set of discrete categories, such as High, Med, and Low. These categories can have a mathematical ordering that you specify, such as High > Med > Low, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to specify groups of rows in a table.

Creation

Description

example

B = categorical(A) creates a categorical array from the array A. The categories of B are the sorted unique values from A.

example

B = categorical(A,valueset) creates one category for each value in valueset. The categories of B are in the same order as the values of valueset.

You can use valueset to include categories for values not present in A. Conversely, if A contains any values not present in valueset, then the corresponding elements of B are undefined.

example

B = categorical(A,valueset,catnames) names the categories in B by matching the category values in valueset with the names in catnames.

example

B = categorical(A,___,Name,Value) creates a categorical array with additional options specified by one or more Name,Value pair arguments. You can include any of the input arguments in previous syntaxes.

For example, to indicate that the categories have a mathematical ordering, specify 'Ordinal',true.

Input Arguments

expand all

Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.

categorical removes leading and trailing spaces from input values that are strings or character vectors.

If A contains missing values, then the corresponding element of B is undefined and displays as <undefined>. The categorical function converts the following values to undefined categorical values:

  • NaN in numeric and duration arrays

  • The missing string (<missing>) or the empty string ("") in string arrays

  • The empty character vector ('') in cell arrays of character vectors

  • NaT in datetime arrays

  • Undefined values (<undefined>) in categorical arrays

B does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in catnames, and a missing value as the corresponding value in valueset.

A also can be an array of objects with the following class methods:

  • unique

  • eq

Categories, specified as a vector of unique values. The data type of valueset and the data type of A must be the same, except when A is a string array. In that case, valueset either can be a string array or a cell array of character vectors.

categorical removes leading and trailing spaces from elements of valueset that are strings or character vectors.

Category names, specified as a cell array of character vectors or a string array. If you do not specify the catnames input argument, then categorical uses the values in valueset as category names.

To merge multiple distinct values in A into a single category in B, include duplicate names corresponding to those values.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Ordinal',true specifies that the categories have a mathematical ordering.

Ordinal variable indicator, specified as the comma-separated pair consisting of 'Ordinal' and either false (0) or true (1).

false (0)

categorical creates a categorical array that is not ordinal, which is the default behavior.

The categories of B have no mathematical ordering. Therefore, you can compare the values in B for equality only. You cannot compare the values using any other relational operator.

true (1)

categorical creates an ordinal categorical array.

The categories of B have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in B using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the min and max functions on an ordinal categorical array.

For more information, see Ordinal Categorical Arrays.

Protected categories indicator, specified as the comma-separated pair consisting of 'Protected' and either false (0) or true (1). The categories of ordinal categorical arrays are always protected. The default value is true when you specify 'Ordinal',true. Otherwise, the value is false.

false (0)

When you assign new values to B, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays.

true (1)

When you assign new values to B, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to B, you must use the function addcats.

Examples

collapse all

Create a categorical array that has weather station labels. Add it to a table of temperature readings. Then use the categories to select temperature readings by station.

First, create arrays containing temperature readings, dates, and station labels.

Temps = [58; 72; 56; 90; 76];
Dates = {'2017-04-17';'2017-04-18';'2017-04-30';'2017-05-01';'2017-04-27'};
Stations = {'S1';'S2';'S1';'S3';'S2'};

Convert Stations to a categorical array.

Stations = categorical(Stations)
Stations = 5x1 categorical
     S1 
     S2 
     S1 
     S3 
     S2 

Display the categories. The three stations labels are categories.

categories(Stations)
ans = 3x1 cell
    {'S1'}
    {'S2'}
    {'S3'}

Create a table that contains the temperatures, dates, and station labels.

T = table(Temps,Dates,Stations)
T=5×3 table
    Temps        Dates         Stations
    _____    ______________    ________

     58      {'2017-04-17'}       S1   
     72      {'2017-04-18'}       S2   
     56      {'2017-04-30'}       S1   
     90      {'2017-05-01'}       S3   
     76      {'2017-04-27'}       S2   

Display the readings taken from station S2. You can use the == operator to find the values of Station that equal S2. Then use logical indexing to select the table rows that have data from station S2.

TF = (T.Stations == 'S2');
T(TF,:)
ans=2×3 table
    Temps        Dates         Stations
    _____    ______________    ________

     72      {'2017-04-18'}       S2   
     76      {'2017-04-27'}       S2   

Convert the cell array of character vectors A to a categorical array. Specify a list of categories that includes values that are not present in A.

Create a cell array of character vectors.

A = {'republican' 'democrat'; 'democrat' 'democrat'; 'democrat' 'republican'};

Convert A to a categorical array. Add a category for independent.

valueset = {'democrat' 'republican' 'independent'};
B = categorical(A,valueset)
B = 3x2 categorical
     republican      democrat   
     democrat        democrat   
     democrat        republican 

Display the categories of B.

categories(B)
ans = 3x1 cell
    {'democrat'   }
    {'republican' }
    {'independent'}

Create a numeric array.

A = [1 3 2; 2 1 3; 3 1 2]
A = 3×3

     1     3     2
     2     1     3
     3     1     2

Convert A to categorical array B and specify category names.

B = categorical(A,[1 2 3],{'red' 'green' 'blue'})
B = 3x3 categorical
     red        blue      green 
     green      red       blue  
     blue       red       green 

Display the categories of B.

categories(B)
ans = 3x1 cell
    {'red'  }
    {'green'}
    {'blue' }

B is not an ordinal categorical array. Therefore, you can compare the values in B only using the equality operators, == and ~=.

Find the elements that belong to the category 'red'. Access those elements using logical indexing.

TF = (B == 'red');
B(TF)
ans = 3x1 categorical
     red 
     red 
     red 

Create a 5-by-2 numeric array.

A = [3 2;3 3;3 2;2 1;3 2]
A = 5×2

     3     2
     3     3
     3     2
     2     1
     3     2

Convert A to an ordinal categorical array where 1, 2, and 3 represent categories child, adult, and senior respectively.

valueset = [1:3];
catnames = {'child' 'adult' 'senior'};

B = categorical(A,valueset,catnames,'Ordinal',true)
B = 5x2 categorical
     senior      adult  
     senior      senior 
     senior      adult  
     adult       child  
     senior      adult  

Since B is ordinal, the categories of B have a mathematical ordering, child < adult < senior.

You can preallocate a categorical array of any size by creating an array of NaNs and converting it to a categorical array. After you preallocate the array, you can initialize its categories by specifying category names and adding the categories to the array.

First create an array of NaNs. You can create an array having any size. For example, create a 2-by-4 array of NaNs.

A = NaN(2,4)
A = 2×4

   NaN   NaN   NaN   NaN
   NaN   NaN   NaN   NaN

Then preallocate a categorical array by converting the array of NaNs. The categorical function converts NaNs to undefined categorical values. Just as a NaN represents "not a number", <undefined> represents a categorical value that does not belong to a category.

A = categorical(A)
A = 2x4 categorical
     <undefined>      <undefined>      <undefined>      <undefined> 
     <undefined>      <undefined>      <undefined>      <undefined> 

In fact, at this point A has no categories.

categories(A)
ans =

  0x0 empty cell array

To initialize the categories of A, specify category names and add them to A by using the addcats function. For example, add small, medium, and large as three categories of A.

A = addcats(A,["small","medium","large"])
A = 2x4 categorical
     <undefined>      <undefined>      <undefined>      <undefined> 
     <undefined>      <undefined>      <undefined>      <undefined> 

While the elements of A are undefined values, the categories have been initialized by addcats.

categories(A)
ans = 3x1 cell
    {'small' }
    {'medium'}
    {'large' }

Now that A has categories, you can assign defined categorical values as elements of A.

A(1) = "medium";
A(8) = "small";
A(3:5) = "large"
A = 2x4 categorical
     medium           large      large            <undefined> 
     <undefined>      large      <undefined>      small       

Starting in R2017a, you can create string arrays using double quotes. Also, a string array can have missing values, displayed as <missing>, without quotation marks.

str = ["plane","jet","plane","helicopter",missing,"jet"]
str = 1x6 string
    "plane"    "jet"    "plane"    "helicopter"    <missing>    "jet"

Convert string array str to a categorical array. The categorical function converts missing strings to undefined categorical values, displayed as <undefined>.

C = categorical(str)
C = 1x6 categorical
     plane      jet      plane      helicopter      <undefined>      jet 

Use the discretize function (instead of categorical) to bin 100 random numbers into three categories.

x = rand(100,1);
y = discretize(x,[0 .25 .75 1],'categorical',{'small','medium','large'});
summary(y)
     small       22 
     medium      46 
     large       32 

Tips

  • For a list of functions that accept or return categorical arrays, see Categorical Arrays.

  • If the input array has numeric, datetime, or duration values that are too close together, then the categorical function truncates them to duplicate values. For example, categorical([1 1.00001]) truncates the second element of the input array. To create categories from numeric data, use the discretize function.

Alternatives

You also can group numeric data into categories using discretize.

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

Introduced in R2013b