Create Dummy Variables for Categorical Predictors and Generate C/C++ Code
This example shows how to generate code for classifying data using a support vector machine (SVM) model. Train the model using numeric and encoded categorical predictors. Use dummyvar
to convert categorical predictors to numeric dummy variables before fitting an SVM classifier. When passing new data to your trained model, you must preprocess the data in a similar manner.
Alternatively, if a trained model identifies categorical predictors in the CategoricalPredictors
property, then you do not need to create dummy variables manually to generate code. The software handles categorical predictors automatically. For an example, see Generate Code to Classify Data in Table.
Preprocess Data and Train SVM Classifier
Load the patients
data set. Create a table using the Diastolic
and Systolic
numeric variables. Each row of the table corresponds to a different patient.
load patients
tbl = table(Diastolic,Systolic);
head(tbl)
Diastolic Systolic _________ ________ 93 124 77 109 83 125 75 117 80 122 70 121 88 130 82 115
Convert the Gender
variable to a categorical
variable. The order of the categories in categoricalGender
is important because it determines the order of the columns in the predictor data. Use dummyvar
to convert the categorical variable to a matrix of zeros and ones, where a 1
value in the (i,j)
th entry indicates that the i
th patient belongs to the j
th category.
categoricalGender = categorical(Gender); orderGender = categories(categoricalGender)
orderGender = 2x1 cell
{'Female'}
{'Male' }
dummyGender = dummyvar(categoricalGender);
Note: The resulting dummyGender
matrix is rank deficient. Depending on the type of model you train, this rank deficiency can be problematic. For example, when training linear models, remove the first column of the dummy variables.
Create a table that contains the dummy variable dummyGender
with the corresponding variable headings. Combine this new table with tbl
.
tblGender = array2table(dummyGender,'VariableNames',orderGender);
tbl = [tbl tblGender];
head(tbl)
Diastolic Systolic Female Male _________ ________ ______ ____ 93 124 0 1 77 109 0 1 83 125 1 0 75 117 1 0 80 122 1 0 70 121 1 0 88 130 1 0 82 115 0 1
Convert the SelfAssessedHealthStatus
variable to a categorical
variable. Note the order of the categories in categoricalHealth
, and convert the variable to a numeric matrix using dummyvar
.
categoricalHealth = categorical(SelfAssessedHealthStatus); orderHealth = categories(categoricalHealth)
orderHealth = 4x1 cell
{'Excellent'}
{'Fair' }
{'Good' }
{'Poor' }
dummyHealth = dummyvar(categoricalHealth);
Create a table that contains dummyHealth
with the corresponding variable headings. Combine this new table with tbl
.
tblHealth = array2table(dummyHealth,'VariableNames',orderHealth);
tbl = [tbl tblHealth];
head(tbl)
Diastolic Systolic Female Male Excellent Fair Good Poor _________ ________ ______ ____ _________ ____ ____ ____ 93 124 0 1 1 0 0 0 77 109 0 1 0 1 0 0 83 125 1 0 0 0 1 0 75 117 1 0 0 1 0 0 80 122 1 0 0 0 1 0 70 121 1 0 0 0 1 0 88 130 1 0 0 0 1 0 82 115 0 1 0 0 1 0
The third row of tbl
, for example, corresponds to a patient with these characteristics: diastolic blood pressure of 83, systolic blood pressure of 125, female, and good self-assessed health status.
Because all the values in tbl
are numeric, you can convert the table to a matrix X
.
X = table2array(tbl);
Train an SVM classifier using X
and a Gaussian kernel function with an automatic kernel scale. Specify the Smoker
variable as the response.
Y = Smoker; Mdl = fitcsvm(X,Y, ... 'KernelFunction','gaussian','KernelScale','auto');
Generate C/C++ Code
Generate code that loads the SVM classifier, takes new predictor data as an input argument, and then classifies the new data.
Save the SVM classifier to a file using saveLearnerForCoder
.
saveLearnerForCoder(Mdl,'SVMClassifier')
saveLearnerForCoder
saves the classifier to the MATLAB® binary file SVMClassifier.mat
as a structure array in the current folder.
Define the entry-point function mySVMPredict
, which takes new predictor data as an input argument. Within the function, load the SVM classifier by using loadLearnerForCoder
, and then pass the loaded classifier to predict
.
function label = mySVMPredict(X) %#codegen Mdl = loadLearnerForCoder('SVMClassifier'); label = predict(Mdl,X); end
Generate code for mySVMPredict
by using codegen
. Specify the data type and dimensions of the new predictor data by using coder.typeof
so that the generated code accepts a variable-size array.
codegen mySVMPredict -args {coder.typeof(X,[Inf 8],[1 0])}
Code generation successful.
Verify that mySVMPredict
and the MEX file return the same results for the training data.
label = predict(Mdl,X); mylabel = mySVMPredict(X); mylabel_mex = mySVMPredict_mex(X); verifyMEX = isequal(label,mylabel,mylabel_mex)
verifyMEX = logical
1
Predict Labels for New Data
To predict labels for new data, you must first preprocess the new data. If you run the generated code in the MATLAB environment, you can follow the preprocessing steps described in this section. If you deploy the generated code outside the MATLAB environment, the preprocessing steps can differ. In either case, you must ensure that the new data has the same columns as the training data X
.
In this example, take the third, fourth, and fifth patients in the patients
data set. Preprocess the data for these patients so that the resulting numeric matrix matches the form of the training data.
Convert the categorical variables to dummy variables. Because the new observations might not include values from all categories, you need to specify the same categories as the ones used during training and maintain the same category order. In MATLAB, pass the ordered cell array of category names associated with the corresponding training data variable (in this example, orderGender
for gender values and orderHealth
for self-assessed health status values).
newcategoricalGender = categorical(Gender(3:5),orderGender); newdummyGender = dummyvar(newcategoricalGender); newcategoricalHealth = categorical(SelfAssessedHealthStatus(3:5),orderHealth); newdummyHealth = dummyvar(newcategoricalHealth);
Combine all the new data into a numeric matrix.
newX = [Diastolic(3:5) Systolic(3:5) newdummyGender newdummyHealth]
newX = 3×8
83 125 1 0 0 0 1 0
75 117 1 0 0 1 0 0
80 122 1 0 0 0 1 0
Note that newX
corresponds exactly to the third, fourth, and fifth rows of the matrix X
.
Verify that mySVMPredict
and the MEX file return the same results for the new data.
newlabel = predict(Mdl,newX); newmylabel = mySVMPredict(newX); newmylabel_mex = mySVMPredict_mex(newX); newverifyMEX = isequal(newlabel,newmylabel,newmylabel_mex)
newverifyMEX = logical
1
See Also
dummyvar
| categorical
| ClassificationSVM
| codegen
(MATLAB Coder) | coder.typeof
(MATLAB Coder) | loadLearnerForCoder
| coder.Constant
(MATLAB Coder) | saveLearnerForCoder