Chapter 4
Applying Supervised Learning
When to Consider Supervised Learning
A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains a model to generate reasonable predictions for the response to new input data. Use supervised learning if you have existing data for the output you are trying to predict.
All supervised learning techniques are a form of classification or regression.

- 2x
- 1.5x
- 1.25x
- 1x, selected
- 0.75x
- 0.5x
- 0.25x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
- 日本語
- 한국어
- en (Main), selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
Classification techniques predict discrete responses—for example, whether an email is genuine or spam, or whether a tumor is small, medium, or large. Classification models are trained to classify data into categories. Applications include medical imaging, speech recognition, and credit scoring.
Regression techniques predict continuous responses—for example, changes in temperature or fluctuations in electricity demand. Applications include forecasting stock prices, handwriting recognition, and acoustic signal processing.
Selecting the Right Algorithm
As we saw in chapter 1, selecting a machine learning algorithm is a process of trial and error. It’s also a tradeoff between specific characteristics of the algorithms, such as:
- Speed of training
- Memory usage
- Predictive accuracy on new data
- Transparency or interpretability (how easily you can understand the reasons an algorithm makes its predictions)
Common Classification Algorithms
Decision Tree
HOW IT WORKS
A decision tree lets you predict responses to data by following the decisions in the tree from the root (beginning) down to a leaf node. A tree consists of branching conditions where the value of a predictor is compared to a trained weight. The number of branches and the values of weights are determined in the training process. Additional modification, or pruning, may be used to simplify the model.BEST USED...
- When you need an algorithm that is easy to interpret and fast to fit
- To minimize memory usage
- When high predictive accuracy is not a requirement
Bagged and Boosted Decision Trees
HOW IT WORKS
In these ensemble methods, several “weaker” decision trees are combined into a “stronger” ensemble.A bagged decision tree consists of trees that are trained independently on data that is bootstrapped from the input data.
Boosting involves creating a strong learner by iteratively adding “weak” learners and adjusting the weight of each weak learner to focus on misclassified examples.
BEST USED...
- When predictors are categorical (discrete) or behave nonlinearly
- When the time taken to train a model is less of a concern
Logistic Regression
HOW IT WORKS
Fits a model that can predict the probability of a binary response belonging to one class or the other. Because of its simplicity, logistic regression is commonly used as a starting point for binary classification problems.BEST USED...
- When data can be clearly separated by a single, linear boundary
- As a baseline for evaluating more complex classification methods
k-Nearest Neighbor (kNN)
HOW IT WORKS
kNN categorizes objects based on the classes of their nearest neighbors in the data set. kNN predictions assume that objects near each other are similar. Distance metrics, such as Euclidean, city block, cosine, and Chebychev, are used to find the nearest neighbor.BEST USED...
- When you need a simple algorithm to establish benchmark learning rules
- When memory usage of the trained model is a lesser concern
- When prediction speed of the trained model is a lesser concern
Support Vector Machine (SVM)
HOW IT WORKS
Classifies data by finding the linear decision boundary (hyperplane) that separates all data points of one class from those of the other class. The best hyperplane for an SVM is the one with the largest margin between the two classes, when the data is linearly separable. If the data is not linearly separable, a loss function is used to penalize points on the wrong side of the hyperplane. SVMs sometimes use a kernel transform to transform nonlinearly separable data into higher dimensions where a linear decision boundary can be found.BEST USED...
- For data that has exactly two classes (you can also use it for multiclass classification with a technique called error-correcting output codes)
- For high-dimensional, nonlinearly separable data
- When you need a classifier that’s simple, easy to interpret, and accurate
Neural Network
HOW IT WORKS
Inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. The network is trained by iteratively modifying the strengths of the connections so that given inputs map to the correct response.BEST USED...
- For modeling highly nonlinear systems
- When data is available incrementally and you wish to constantly update the model
- When there could be unexpected changes in your input data
- When model interpretability is not a key concern
Naive Bayes
HOW IT WORKS
A naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. It classifies new data based on the highest probability of its belonging to a particular class.BEST USED...
- For a small data set containing many parameters
- When you need a classifier that’s easy to interpret
- When the model will encounter scenarios that weren’t in the training data, as is the case with many financial and medical applications
Discriminant Analysis
HOW IT WORKS
Discriminant analysis classifies data by finding linear combinations of features. Discriminant analysis assumes that different classes generate data based on Gaussian distributions. Training a discriminant analysis model involves finding the parameters for a Gaussian distribution for each class. The distribution parameters are used to calculate boundaries, which can be linear or quadratic functions. These boundaries are used to determine the class of new data.BEST USED...
- When you need a simple model that is easy to interpret
- When memory usage during training is a concern
- When you need a model that is fast to predict
Decision Tree
HOW IT WORKS
A decision tree lets you predict responses to data by following the decisions in the tree from the root (beginning) down to a leaf node. A tree consists of branching conditions where the value of a predictor is compared to a trained weight. The number of branches and the values of weights are determined in the training process. Additional modification, or pruning, may be used to simplify the model.BEST USED...
- When you need an algorithm that is easy to interpret and fast to fit
- To minimize memory usage
- When high predictive accuracy is not a requirement
Bagged and Boosted Decision Trees
HOW IT WORKS
In these ensemble methods, several “weaker” decision trees are combined into a “stronger” ensemble.A bagged decision tree consists of trees that are trained independently on data that is bootstrapped from the input data.
Boosting involves creating a strong learner by iteratively adding “weak” learners and adjusting the weight of each weak learner to focus on misclassified examples.
BEST USED...
- When predictors are categorical (discrete) or behave nonlinearly
- When the time taken to train a model is less of a concern
Logistic Regression
HOW IT WORKS
Fits a model that can predict the probability of a binary response belonging to one class or the other. Because of its simplicity, logistic regression is commonly used as a starting point for binary classification problems.BEST USED...
- When data can be clearly separated by a single, linear boundary
- As a baseline for evaluating more complex classification methods
k-Nearest Neighbor (kNN)
HOW IT WORKS
kNN categorizes objects based on the classes of their nearest neighbors in the data set. kNN predictions assume that objects near each other are similar. Distance metrics, such as Euclidean, city block, cosine, and Chebychev, are used to find the nearest neighbor.BEST USED...
- When you need a simple algorithm to establish benchmark learning rules
- When memory usage of the trained model is a lesser concern
- When prediction speed of the trained model is a lesser concern
Common Regression Algorithms
Generalized Linear Model
HOW IT WORKS
A generalized linear model is a special case of nonlinear models that uses linear methods. It involves fitting a linear combination of the inputs to a nonlinear function (the link function) of the outputs.BEST USED...
- When the response variables have nonnormal distributions, such as a response variable that is always expected to be positive
Regression Tree
HOW IT WORKS
Decision trees for regression are similar to decision trees for classification, but they are modified to be able to predict continuous responses.BEST USED...
- When predictors are categorical (discrete) or behave nonlinearly
Linear Regression
HOW IT WORKS
Linear regression is a statistical modeling technique used to describe a continuous response variable as a linear function of one or more predictor variables. Because linear regression models are simple to interpret and easy to train, they are often the first model to be fitted to a new data set.BEST USED...
- When you need an algorithm that is easy to interpret and fast to fit
- As a baseline for evaluating other, more complex, regression models
Nonlinear Regression
HOW IT WORKS
Nonlinear regression is a statistical modeling technique that helps describe nonlinear relationships in experimental data. Nonlinear regression models are generally assumed to be parametric, where the model is described as a nonlinear equation.“Nonlinear” refers to a fit function that is a nonlinear function of the parameters. For example, if the fitting parameters are b0, b1, and b2: the equation y = b0+b1x+b2x2 is a linear function of the fitting parameters, whereas y = (b0xb1)/(x+b2) is a nonlinear function of the fitting parameters.
BEST USED...
- When data has strong nonlinear trends and cannot be easily transformed into a linear space
- For fitting custom models to data
Gaussian Process Regression Model
HOW IT WORKS
Gaussian process regression (GPR) models are nonparametric models that are used for predicting the value of a continuous response variable. They are widely used in the field of spatial analysis for interpolation in the presence of uncertainty. GPR is also referred to as Kriging.BEST USED...
- For interpolating spatial data, such as hydrogeological data for the distribution of ground water
- As a surrogate model to facilitate optimization of complex designs such as automotive engines
SVM Regression
HOW IT WORKS
SVM regression algorithms work like SVM classification algorithms, but are modified to be able to predict a continuous response. Instead of finding a hyperplane that separates data, SVM regression algorithms find a model that deviates from the measured data by a value no greater than a small amount, with parameter values that are as small as possible (to minimize sensitivity to error).BEST USED...
- For high-dimensional data (where there will be a large number of predictor variables)
Generalized Linear Model
HOW IT WORKS
A generalized linear model is a special case of nonlinear models that uses linear methods. It involves fitting a linear combination of the inputs to a nonlinear function (the link function) of the outputs.BEST USED...
- When the response variables have nonnormal distributions, such as a response variable that is always expected to be positive
Regression Tree
HOW IT WORKS
Decision trees for regression are similar to decision trees for classification, but they are modified to be able to predict continuous responses.BEST USED...
- When predictors are categorical (discrete) or behave nonlinearly
Linear Regression
HOW IT WORKS
Linear regression is a statistical modeling technique used to describe a continuous response variable as a linear function of one or more predictor variables. Because linear regression models are simple to interpret and easy to train, they are often the first model to be fitted to a new data set.BEST USED...
- When you need an algorithm that is easy to interpret and fast to fit
- As a baseline for evaluating other, more complex, regression models
Nonlinear Regression
HOW IT WORKS
Nonlinear regression is a statistical modeling technique that helps describe nonlinear relationships in experimental data. Nonlinear regression models are generally assumed to be parametric, where the model is described as a nonlinear equation.“Nonlinear” refers to a fit function that is a nonlinear function of the parameters. For example, if the fitting parameters are b0, b1, and b2: the equation y = b0+b1x+b2x2 is a linear function of the fitting parameters, whereas y = (b0xb1)/(x+b2) is a nonlinear function of the fitting parameters.
BEST USED...
- When data has strong nonlinear trends and cannot be easily transformed into a linear space
- For fitting custom models to data
Recommended Next Steps
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)