templateLinear
Linear learner template
Description
returns a linear
learner template suitable for training a linear classification or regression model
on high-dimensional data.t
= templateLinear
returns a template with additional options specified by one or more name-value
arguments. t
= templateLinear(Name,Value
)
For example, you can specify the regularization type or strength, or specify the
solver to use for objective-function minimization. If you do not specify the
learner, then the default value "svm"
is used.
If you specify the type of model by using the Type
name-value argument, then the display of t
in the Command Window
shows all options as empty ([]
), except those that you specify
using name-value arguments. If you do not specify the type of model, then the
display suppresses the empty options. During training, the software uses default
values for empty options.
Examples
Train Multiclass Linear Classification Model
Create a default linear learner template, and then use it to train an ECOC model containing multiple binary linear classification models.
Load the NLP data set.
load nlpdata
X
is a sparse matrix of predictor data, and Y
is a categorical vector of class labels. The data contains 13 classes.
Create a default linear learner template.
t = templateLinear
t = Fit template for Linear. Learner: 'svm'
t
is a template object for a linear learner. All of the properties of t
are empty. When you pass t
to a training function, such as fitcecoc
for ECOC multiclass classification, the software sets the empty properties to their respective default values. For example, the software sets Type
to "classification"
. To modify the default values see the name-value arguments for templateLinear
.
Train an ECOC model consisting of multiple binary linear classification models that identify the software product given the frequency distribution of words on a documentation web page. For faster training time, transpose the predictor data, and specify that observations correspond to columns.
X = X'; rng(1); % For reproducibility Mdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns')
Mdl = CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp ecoder fixedpoint hdlcoder phased physmod simulink stats supportpkg symbolic vision xpc] ScoreTransform: 'none' BinaryLearners: {78x1 cell} CodingMatrix: [13x78 double]
Alternatively, you can train an ECOC model containing default linear classification models by specifying "Learners","Linear"
.
To conserve memory, fitcecoc
returns trained ECOC models containing linear classification learners in CompactClassificationECOC
model objects.
Input Arguments
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Learner','logistic','Regularization','lasso','CrossVal','on'
specifies to implement logistic regression with a lasso penalty, and to implement 10-fold cross-validation.
Lambda
— Regularization term strength
'auto'
(default) | nonnegative scalar | vector of nonnegative values
Regularization term strength, specified as the comma-separated pair consisting of 'Lambda'
and 'auto'
, a nonnegative scalar, or a vector of nonnegative values.
For
'auto'
,Lambda
= 1/n.If you specify a cross-validation, name-value pair argument (e.g.,
CrossVal
), then n is the number of in-fold observations.Otherwise, n is the training sample size.
For a vector of nonnegative values,
templateLinear
sequentially optimizes the objective function for each distinct value inLambda
in ascending order.If
Solver
is'sgd'
or'asgd'
andRegularization
is'lasso'
,templateLinear
does not use the previous coefficient estimates as a warm start for the next optimization iteration. Otherwise,templateLinear
uses warm starts.If
Regularization
is'lasso'
, then any coefficient estimate of 0 retains its value whentemplateLinear
optimizes using subsequent values inLambda
.templateLinear
returns coefficient estimates for each specified regularization strength.
Example: 'Lambda',10.^(-(10:-2:2))
Data Types: char
| string
| double
| single
Learner
— Linear learner type
"svm"
(default) | "logistic"
| "leastsquares"
Linear learner type, specified as "svm"
,
"logistic"
, or
"leastsquares"
.
In this table,
β is a vector of p coefficients.
x is an observation from p predictor variables.
b is the scalar bias.
Value | Algorithm | Response Range | Loss Function |
---|---|---|---|
"svm" | Support vector machine (classification or regression) | Classification: y ∊ {–1,1}; 1 for the positive class and –1 otherwise Regression: y ∊ (-∞,∞) | Classification: Hinge Regression: Epsilon-insensitive |
"logistic" | Logistic regression (classification only) | y ∊ {–1,1}; 1 for the positive class and –1 otherwise | Deviance (logistic) |
"leastsquares" | Linear regression via ordinary least squares (regression only) | y ∊ (-∞,∞) | Mean squared error (MSE) |
Example: "Learner","logistic"
Regularization
— Complexity penalty type
'lasso'
| 'ridge'
Complexity penalty type, specified as the comma-separated pair
consisting of 'Regularization'
and 'lasso'
or 'ridge'
.
The software composes the objective function for minimization
from the sum of the average loss function (see Learner
)
and the regularization term in this table.
Value | Description |
---|---|
'lasso' | Lasso (L1) penalty: |
'ridge' | Ridge (L2) penalty: |
To specify the regularization term strength, which is λ in
the expressions, use Lambda
.
The software excludes the bias term (β0) from the regularization penalty.
If Solver
is 'sparsa'
,
then the default value of Regularization
is 'lasso'
.
Otherwise, the default is 'ridge'
.
Tip
For predictor variable selection, specify
'lasso'
. For more on variable selection, see Introduction to Feature Selection.For optimization accuracy, specify
'ridge'
.
Example: 'Regularization','lasso'
Solver
— Objective function minimization technique
'sgd'
| 'asgd'
| 'dual'
| 'bfgs'
| 'lbfgs'
| 'sparsa'
| string array | cell array of character vectors
Objective function minimization technique, specified as the comma-separated pair consisting of 'Solver'
and a character vector or string scalar, a string array, or a cell array of character vectors with values from this table.
Value | Description | Restrictions |
---|---|---|
'sgd' | Stochastic gradient descent (SGD) [4][2] | |
'asgd' | Average stochastic gradient descent (ASGD) [7] | |
'dual' | Dual SGD for SVM [1][6] | Regularization must be 'ridge' and Learner must be 'svm' . |
'bfgs' | Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm (BFGS) [3] | Inefficient if X is very high-dimensional.
Regularization must be
'ridge' . |
'lbfgs' | Limited-memory BFGS (LBFGS) [3] | Regularization must be 'ridge' . |
'sparsa' | Sparse Reconstruction by Separable Approximation (SpaRSA) [5] | Regularization must be 'lasso' . |
If you specify:
A ridge penalty (see
Regularization
) and the predictor data set contains 100 or fewer predictor variables, then the default solver is'bfgs'
.An SVM model (see
Learner
), a ridge penalty, and the predictor data set contains more than 100 predictor variables, then the default solver is'dual'
.A lasso penalty and the predictor data set contains 100 or fewer predictor variables, then the default solver is
'sparsa'
.
Otherwise, the default solver is
'sgd'
. Note that the default solver can change when
you perform hyperparameter optimization. For more information, see Regularization method determines the linear learner solver used during hyperparameter optimization.
If you specify a string array or cell array of solver names, then, for
each value in Lambda
, the software uses the
solutions of solver j as a warm start for solver
j + 1.
Example: {'sgd' 'lbfgs'}
applies SGD to solve the
objective, and uses the solution as a warm start for
LBFGS.
Tip
SGD and ASGD can solve the objective function more quickly than other solvers, whereas LBFGS and SpaRSA can yield more accurate solutions than other solvers. Solver combinations like
{'sgd' 'lbfgs'}
and{'sgd' 'sparsa'}
can balance optimization speed and accuracy.When choosing between SGD and ASGD, consider that:
SGD takes less time per iteration, but requires more iterations to converge.
ASGD requires fewer iterations to converge, but takes more time per iteration.
If the predictor data is high dimensional and
Regularization
is'ridge'
, setSolver
to any of these combinations:'sgd'
'asgd'
'dual'
ifLearner
is'svm'
'lbfgs'
{'sgd','lbfgs'}
{'asgd','lbfgs'}
{'dual','lbfgs'}
ifLearner
is'svm'
Although you can set other combinations, they often lead to solutions with poor accuracy.
If the predictor data is moderate through low dimensional and
Regularization
is'ridge'
, setSolver
to'bfgs'
.If
Regularization
is'lasso'
, setSolver
to any of these combinations:'sgd'
'asgd'
'sparsa'
{'sgd','sparsa'}
{'asgd','sparsa'}
Example: 'Solver',{'sgd','lbfgs'}
Beta
— Initial linear coefficient estimates
zeros(p
,1)
(default) | numeric vector | numeric matrix
p
,1)Initial linear coefficient estimates (β), specified as the comma-separated
pair consisting of 'Beta'
and a p-dimensional
numeric vector or a p-by-L numeric matrix.
p is the number of predictor variables after dummy variables are
created for categorical variables (for more details, see
CategoricalPredictors
), and L is the number
of regularization-strength values (for more details, see
Lambda
).
If you specify a p-dimensional vector, then the software optimizes the objective function L times using this process.
The software optimizes using
Beta
as the initial value and the minimum value ofLambda
as the regularization strength.The software optimizes again using the resulting estimate from the previous optimization as a warm start, and the next smallest value in
Lambda
as the regularization strength.The software implements step 2 until it exhausts all values in
Lambda
.
If you specify a p-by-L matrix, then the software optimizes the objective function L times. At iteration
j
, the software usesBeta(:,
as the initial value and, after it sortsj
)Lambda
in ascending order, usesLambda(
as the regularization strength.j
)
If you set 'Solver','dual'
, then the software
ignores Beta
.
Data Types: single
| double
Bias
— Initial intercept estimate
numeric scalar | numeric vector
Initial intercept estimate (b), specified
as the comma-separated pair consisting of 'Bias'
and
a numeric scalar or an L-dimensional numeric vector. L is
the number of regularization-strength values (for more details, see Lambda
).
If you specify a scalar, then the software optimizes the objective function L times using this process.
The software optimizes using
Bias
as the initial value and the minimum value ofLambda
as the regularization strength.The uses the resulting estimate as a warm start to the next optimization iteration, and uses the next smallest value in
Lambda
as the regularization strength.The software implements step 2 until it exhausts all values in
Lambda
.
If you specify an L-dimensional vector, then the software optimizes the objective function L times. At iteration
j
, the software usesBias(
as the initial value and, after it sortsj
)Lambda
in ascending order, usesLambda(
as the regularization strength.j
)By default:
If
Learner
is'logistic'
, then let gj be 1 ifY(
is the positive class, and -1 otherwise.j
)Bias
is the weighted average of the g for training or, for cross-validation, in-fold observations.If
Learner
is'svm'
, thenBias
is 0.
Data Types: single
| double
FitBias
— Linear model intercept inclusion flag
true
(default) | false
Linear model intercept inclusion flag, specified as the comma-separated
pair consisting of 'FitBias'
and true
or false
.
Value | Description |
---|---|
true | The software includes the bias term b in the linear model, and then estimates it. |
false | The software sets b = 0 during estimation. |
Example: 'FitBias',false
Data Types: logical
PostFitBias
— Flag to fit linear model intercept after optimization
false
(default) | true
Flag to fit the linear model intercept after optimization, specified
as the comma-separated pair consisting of 'PostFitBias'
and true
or false
.
Value | Description |
---|---|
false | The software estimates the bias term b and the coefficients β during optimization. |
true |
To estimate b, the software:
|
If you specify true
, then FitBias
must
be true.
Example: 'PostFitBias',true
Data Types: logical
Type
— Linear model type
"classification"
| "regression"
Since R2023b
Linear model type, specified as "classification"
or
"regression"
.
Value | Description |
---|---|
"classification" | Create a classification linear learner template.
If you do not specify Type as
"classification" , the fitting
functions fitcecoc ,
testckfold , and fitsemigraph set this value when you
pass t to them. |
"regression" | Create a regression linear learner template. If
you do not specify Type as
"regression" , the fitting
function directforecaster sets this value when
you pass t to it. |
Example: "Type","classification"
Data Types: char
| string
Verbose
— Verbosity level
0
(default) | 1
Verbosity level, specified as the comma-separated pair consisting of 'Verbose'
and either 0
or 1
. Verbose
controls the display of diagnostic information at the command line.
Value | Description |
---|---|
0 | templateLinear does not display diagnostic information. |
1 | templateLinear periodically displays the value of the objective function, gradient magnitude, and other diagnostic information. |
Example: 'Verbose',1
Data Types: single
| double
Epsilon
— Half the width of epsilon-insensitive band
iqr(Y)/13.49
(default) | nonnegative scalar value
Half the width of the epsilon-insensitive band, specified as a nonnegative scalar value. This argument applies to support vector machine learners only.
The default Epsilon
value is
iqr(Y)/13.49
, which is an estimate of standard
deviation using the interquartile range of the response variable
Y
. If iqr(Y)
is equal to zero,
then the default Epsilon
value is 0.1.
Example: "Epsilon",0.3
Data Types: single
| double
BatchSize
— Mini-batch size
positive integer
Mini-batch size, specified as the comma-separated pair consisting of 'BatchSize'
and a positive integer. At each iteration, the software estimates the gradient using BatchSize
observations from the training data.
If the predictor data is a numeric matrix, then the default value is
10
.If the predictor data is a sparse matrix, then the default value is
max([10,ceil(sqrt(ff))])
, whereff = numel(X)/nnz(X)
, that is, the fullness factor ofX
.
Example: 'BatchSize',100
Data Types: single
| double
LearnRate
— Learning rate
positive scalar
Learning rate, specified as the comma-separated pair consisting of 'LearnRate'
and a positive scalar. LearnRate
controls the optimization step size by scaling the subgradient.
If
Regularization
is'ridge'
, thenLearnRate
specifies the initial learning rate γ0.templateLinear
determines the learning rate for iteration t, γt, usingIf
Regularization
is'lasso'
, then, for all iterations,LearnRate
is constant.
By default, LearnRate
is 1/sqrt(1+max((sum(X.^2,obsDim))))
, where obsDim
is 1
if the observations compose the columns of the predictor data X
, and 2
otherwise.
Example: 'LearnRate',0.01
Data Types: single
| double
OptimizeLearnRate
— Flag to decrease learning rate
true
(default) | false
Flag to decrease the learning rate when the software detects
divergence (that is, over-stepping the minimum), specified as the
comma-separated pair consisting of 'OptimizeLearnRate'
and true
or false
.
If OptimizeLearnRate
is 'true'
,
then:
For the few optimization iterations, the software starts optimization using
LearnRate
as the learning rate.If the value of the objective function increases, then the software restarts and uses half of the current value of the learning rate.
The software iterates step 2 until the objective function decreases.
Example: 'OptimizeLearnRate',true
Data Types: logical
TruncationPeriod
— Number of mini-batches between lasso truncation runs
10
(default) | positive integer
Number of mini-batches between lasso truncation runs, specified
as the comma-separated pair consisting of 'TruncationPeriod'
and
a positive integer.
After a truncation run, the software applies a soft threshold
to the linear coefficients. That is, after processing k = TruncationPeriod
mini-batches,
the software truncates the estimated coefficient j using
For SGD, is the estimate of coefficient j after processing k mini-batches. γt is the learning rate at iteration t. λ is the value of
Lambda
.For ASGD, is the averaged estimate coefficient j after processing k mini-batches,
If Regularization
is 'ridge'
,
then the software ignores TruncationPeriod
.
Example: 'TruncationPeriod',100
Data Types: single
| double
BatchLimit
— Maximal number of batches
positive integer
Maximal number of batches to process, specified as the comma-separated
pair consisting of 'BatchLimit'
and a positive
integer. When the software processes BatchLimit
batches,
it terminates optimization.
By default:
If you specify
BatchLimit
, thentemplateLinear
uses the argument that results in processing the fewest observations, eitherBatchLimit
orPassLimit
.
Example: 'BatchLimit',100
Data Types: single
| double
BetaTolerance
— Relative tolerance on linear coefficients and bias term
1e-4
(default) | nonnegative scalar
Relative tolerance on the linear coefficients and the bias term (intercept), specified
as the comma-separated pair consisting of 'BetaTolerance'
and a
nonnegative scalar.
Let , that is, the vector of the coefficients and the bias term at optimization iteration t. If , then optimization terminates.
If the software converges for the last solver specified in
Solver
, then optimization terminates. Otherwise, the software uses
the next solver specified in Solver
.
Example: 'BetaTolerance',1e-6
Data Types: single
| double
NumCheckConvergence
— Number of batches to process before next convergence check
positive integer
Number of batches to process before next convergence check, specified as the
comma-separated pair consisting of 'NumCheckConvergence'
and a
positive integer.
To specify the batch size, see BatchSize
.
The software checks for convergence about 10 times per pass through the entire data set by default.
Example: 'NumCheckConvergence',100
Data Types: single
| double
PassLimit
— Maximal number of passes
1
(default) | positive integer
Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit'
and a positive integer.
The software processes all observations when it completes one pass through the data.
When the software passes through the data PassLimit
times, it terminates optimization.
If you specify BatchLimit
, then
templateLinear
uses the argument that results in
processing the fewest observations, either
BatchLimit
or
PassLimit
.
Example: 'PassLimit',5
Data Types: single
| double
BetaTolerance
— Relative tolerance on linear coefficients and bias term
1e-4
(default) | nonnegative scalar
Relative tolerance on the linear coefficients and the bias term (intercept), specified
as the comma-separated pair consisting of 'BetaTolerance'
and a
nonnegative scalar.
Let , that is, the vector of the coefficients and the bias term at optimization iteration t. If , then optimization terminates.
If you also specify DeltaGradientTolerance
, then optimization
terminates when the software satisfies either stopping criterion.
If the software converges for the last solver specified in
Solver
, then optimization terminates. Otherwise, the software uses
the next solver specified in Solver
.
Example: 'BetaTolerance',1e-6
Data Types: single
| double
DeltaGradientTolerance
— Gradient-difference tolerance
1
(default) | nonnegative scalar
Gradient-difference tolerance between upper and lower pool Karush-Kuhn-Tucker (KKT) complementarity conditions violators, specified as a nonnegative scalar.
If the magnitude of the KKT violators is less than
DeltaGradientTolerance
, then the software terminates optimization.If the software converges for the last solver specified in
Solver
, then optimization terminates. Otherwise, the software uses the next solver specified inSolver
.
Example: 'DeltaGradientTolerance',1e-2
Data Types: double
| single
NumCheckConvergence
— Number of passes through entire data set to process before next convergence check
5
(default) | positive integer
Number of passes through entire data set to process before next convergence check,
specified as the comma-separated pair consisting of
'NumCheckConvergence'
and a positive integer.
Example: 'NumCheckConvergence',100
Data Types: single
| double
PassLimit
— Maximal number of passes
10
(default) | positive integer
Maximal number of passes through the data, specified as the
comma-separated pair consisting of 'PassLimit'
and
a positive integer.
When the software completes one pass through the data, it has processed all observations.
When the software passes through the data PassLimit
times,
it terminates optimization.
Example: 'PassLimit',5
Data Types: single
| double
BetaTolerance
— Relative tolerance on linear coefficients and bias term
1e-4
(default) | nonnegative scalar
Relative tolerance on the linear coefficients and the bias term (intercept), specified as a nonnegative scalar.
Let , that is, the vector of the coefficients and the bias term at optimization iteration t. If , then optimization terminates.
If you also specify GradientTolerance
, then optimization terminates when the software satisfies either stopping criterion.
If the software converges for the last solver specified in
Solver
, then optimization terminates. Otherwise, the software uses
the next solver specified in Solver
.
Example: 'BetaTolerance',1e-6
Data Types: single
| double
GradientTolerance
— Absolute gradient tolerance
1e-6
(default) | nonnegative scalar
Absolute gradient tolerance, specified as a nonnegative scalar.
Let be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If , then optimization terminates.
If you also specify BetaTolerance
, then optimization terminates when the
software satisfies either stopping criterion.
If the software converges for the last solver specified in the
software, then optimization terminates. Otherwise, the software uses
the next solver specified in Solver
.
Example: 'GradientTolerance',1e-5
Data Types: single
| double
HessianHistorySize
— Size of history buffer for Hessian approximation
15
(default) | positive integer
Size of history buffer for Hessian approximation, specified
as the comma-separated pair consisting of 'HessianHistorySize'
and
a positive integer. That is, at each iteration, the software composes
the Hessian using statistics from the latest HessianHistorySize
iterations.
The software does not support 'HessianHistorySize'
for
SpaRSA.
Example: 'HessianHistorySize',10
Data Types: single
| double
IterationLimit
— Maximal number of optimization iterations
1000
(default) | positive integer
Maximal number of optimization iterations, specified as the
comma-separated pair consisting of 'IterationLimit'
and
a positive integer. IterationLimit
applies to these
values of Solver
: 'bfgs'
, 'lbfgs'
,
and 'sparsa'
.
Example: 'IterationLimit',500
Data Types: single
| double
Output Arguments
t
— Linear learner template
template object
Linear learner template suitable for training linear classification or regression models, returned as a template object. During training, the software uses default values for empty options.
More About
Warm Start
A warm start is initial estimates of the beta coefficients and bias term supplied to an optimization routine for quicker convergence.
Tips
It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify
'ObservationsIn','columns'
. As a result, you can experience a significant reduction in optimization-execution time.If the predictor data has few observations, but many predictor variables, then:
Specify
'PostFitBias',true
.For SGD or ASGD solvers, set
PassLimit
to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy.
For SGD and ASGD solvers,
BatchSize
affects the rate of convergence.If
BatchSize
is too small, then the software achieves the minimum in many iterations, but computes the gradient per iteration quickly.If
BatchSize
is too large, then the software achieves the minimum in fewer iterations, but computes the gradient per iteration slowly.
Large learning rate (see
LearnRate
) speed-up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination.If
Regularization
is'lasso'
, then experiment with various values ofTruncationPeriod
. For example, setTruncationPeriod
to1
,10
, and then100
.For efficiency, the software does not standardize predictor data. To standardize the predictor data (
X
) where you orient the observations as the columns, enterX = normalize(X,2);
If you orient the observations as the rows, enter
X = normalize(X);
For memory-usage economy, the code replaces the original predictor data the standardized data.
References
[1] Hsieh, C. J., K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent Method for Large-Scale Linear SVM.” Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 2001, pp. 408–415.
[2] Langford, J., L. Li, and T. Zhang. “Sparse Online Learning Via Truncated Gradient.” J. Mach. Learn. Res., Vol. 10, 2009, pp. 777–801.
[3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006.
[4] Shalev-Shwartz, S., Y. Singer, and N. Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for SVM.” Proceedings of the 24th International Conference on Machine Learning, ICML ’07, 2007, pp. 807–814.
[5] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. “Sparse Reconstruction by Separable Approximation.” Trans. Sig. Proc., Vol. 57, No 7, 2009, pp. 2479–2493.
[6] Xiao, Lin. “Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization.” J. Mach. Learn. Res., Vol. 11, 2010, pp. 2543–2596.
[7] Xu, Wei. “Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent.” CoRR, abs/1107.2490, 2011.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations when you train a model by passing a linear model template and tall arrays to fitcecoc
:
The default values for these name-value pair arguments are different when you work with tall arrays.
'Lambda'
— Can be'auto'
(default) or a scalar'Regularization'
— Supports only'ridge'
'Solver'
— Supports only'lbfgs'
'FitBias'
— Supports onlytrue
'Verbose'
— Default value is1
'BetaTolerance'
— Default value is relaxed to1e–3
'GradientTolerance'
— Default value is relaxed to1e–3
'IterationLimit'
— Default value is relaxed to20
When
fitcecoc
uses atemplateLinear
object with tall arrays, the only available solver is LBFGS. The software implements LBFGS by distributing the calculation of the loss and gradient among different parts of the tall array at each iteration. If you do not specify initial values forBeta
andBias
, the software refines the initial estimates of the parameters by fitting the model locally to parts of the data and combining the coefficients by averaging.
For more information, see Tall Arrays.
Version History
Introduced in R2016aR2023b: Support for regression learner templates
templateLinear
supports the creation of
regression learner templates. Specify the Type
name-value
argument as "regression" in the call to the function. When creating a regression
learner template, you can additionally specify the Epsilon
name-value argument for support vector machine learners.
See Also
ClassificationLinear
| RegressionLinear
| fitclinear
| fitrlinear
| fitcecoc
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)