Main Content

dummyvar

Create dummy variables

Description

D = dummyvar(group) returns a matrix D containing zeros and ones, whose columns are dummy variables for the grouping variables in group. Each column of group is a single grouping variable, with values indicating category levels. The rows of group represent observations across all variables.

example

Examples

collapse all

Create a column vector of categorical data specifying color types.

Colors = {'Red';'Blue';'Green';'Red';'Green';'Blue'};
Colors = categorical(Colors);

Create dummy variables for each color type.

D = dummyvar(Colors)
D = 6×3

     0     0     1
     1     0     0
     0     1     0
     0     0     1
     0     1     0
     1     0     0

The columns in D correspond to the levels in Colors. For example, the first column of dummyvar corresponds to the first level, 'Blue', in Colors.

Display the category levels of Colors.

categories(Colors)
ans = 3x1 cell
    {'Blue' }
    {'Green'}
    {'Red'  }

Create a matrix group of data containing the effects of two machines and three operators on a process.

machine = [1 1 1 1 2 2 2 2]';
operator = [1 2 3 1 2 3 1 2]';
group = [machine operator]
group = 8×2

     1     1
     1     2
     1     3
     1     1
     2     2
     2     3
     2     1
     2     2

Create dummy variables of the data in group.

D = dummyvar(group)
D = 8×5

     1     0     1     0     0
     1     0     0     1     0
     1     0     0     0     1
     1     0     1     0     0
     0     1     0     1     0
     0     1     0     0     1
     0     1     1     0     0
     0     1     0     1     0

The first two columns of D represent observations of machine 1 and machine 2, respectively. The remaining columns represent observations of the three operators.

Create a cell array of phone types and a numeric vector of area codes.

phone = {'mobile';'landline';'mobile';'mobile';'mobile';'landline';'landline'};
codes = [802 802 603 603 802 603 802]';

Because the area code data has two levels (rather than 802 levels corresponding to the integers 1:802), convert codes to a categorical vector.

newcodes = categorical(codes);

Combine the phone and newcodes grouping variables into the cell array group.

group = {phone,newcodes};

Create dummy variables for the groups in group.

D = dummyvar(group)
D = 7×4

     1     0     0     1
     0     1     0     1
     1     0     1     0
     1     0     1     0
     1     0     0     1
     0     1     1     0
     0     1     0     1

The first two columns of D correspond to the phone types, and the last two columns correspond to the area codes.

Create dummy variables, and then decode them back into the original data.

Create a column vector of categorical data specifying color types.

colorsOriginal = ["red";"blue";"red";"green";"yellow";"blue"];
colorsOriginal = categorical(colorsOriginal)
colorsOriginal = 6x1 categorical
     red 
     blue 
     red 
     green 
     yellow 
     blue 

Determine the classes in the categorical vector.

classes = categories(colorsOriginal);

Create dummy variables for each color type by using the dummyvar function.

dummyColors = dummyvar(colorsOriginal)
dummyColors = 6×4

     0     0     1     0
     1     0     0     0
     0     0     1     0
     0     1     0     0
     0     0     0     1
     1     0     0     0

Decode the dummy variables in the second dimension by using the onehotdecode function.

colorsDecoded = onehotdecode(dummyColors,classes,2)
colorsDecoded = 6x1 categorical
     red 
     blue 
     red 
     green 
     yellow 
     blue 

The decoded variables match the original color types.

Input Arguments

collapse all

Grouping variables, specified as a positive integer vector or categorical column vector representing levels within a single variable, a cell array containing one or more grouping variables, or a positive integer matrix representing levels within multiple variables.

If group is a categorical vector, then the groups and their order match the output of the categories function applied to group. If group is a numeric vector, then dummyvar assumes that the groups and their order are 1:max(group). In this respect, dummyvar treats a numeric grouping variable differently from grp2idx. For information on the order of groups within grouping variables, see Grouping Variables.

Example: [2 1 1 1 2 3 3 2]'

Example: {Origin,Cylinders}

Data Types: single | double | categorical | cell

Output Arguments

collapse all

Dummy variables, returned as an n-by-s numeric matrix, where n is the number of rows of group and s is the sum of the number of levels in each column of group. From left to right, the columns of D are dummy variables created from the first column of group, followed by dummy variables created from the second column of group, and so on.

Data Types: single | double

Tips

  • Use dummy variables in regression analysis and ANOVA to indicate values of categorical predictors.

  • dummyvar treats NaN values and undefined categorical levels in group as missing data and returns NaN values in D.

  • If a column of ones is introduced in the matrix D, then the resulting matrix X = [ones(size(D,1),1) D] is rank deficient. If group has multiple columns, then the matrix D itself is rank deficient because dummy variables produced from any column of group always sum to a column of ones. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column of group.

  • If group is a numeric vector with levels that do not correspond exactly to the integers 1:max(group), first convert the data to a categorical vector by using categorical. You can then pass the result to dummyvar. For an example, see Create Dummy Variables from Multiple Grouping Variables.

Alternative Functionality

Alternatively, use onehotencode to encode data labels. Consider using onehotencode instead of dummyvar in these cases:

  • To encode a table of categorical data labels

  • To specify the dimension to expand for encoding the data labels

Extended Capabilities

Version History

Introduced before R2006a