First-order optimality is a measure of how close a point *x* is
to optimal. Most Optimization
Toolbox™ solvers use this measure,
though it has different definitions for different algorithms. First-order
optimality is a necessary condition, but it is not a sufficient condition.
In other words:

The first-order optimality measure must be zero at a minimum.

A point with first-order optimality equal to zero is not necessarily a minimum.

For general information about first-order optimality, see Nocedal and Wright [31]. For specifics about the first-order optimality measures for Optimization Toolbox solvers, see Unconstrained Optimality, Constrained Optimality Theory, and Constrained Optimality in Solver Form.

The `OptimalityTolerance`

tolerance relates
to the first-order optimality measure. Typically, if the first-order
optimality measure is less than `OptimalityTolerance`

,
solver iterations end.

Some solvers or algorithms use *relative* first-order
optimality as a stopping criterion. Solver iterations end if the first-order
optimality measure is less than *μ* times `OptimalityTolerance`

,
where *μ* is either:

The infinity norm (maximum) of the gradient of the objective function at

`x0`

The infinity norm (maximum) of inputs to the solver, such as

`f`

or`b`

in`linprog`

or`H`

in`quadprog`

A relative measure attempts to account for the scale of a problem. Multiplying an objective function by a very large or small number does not change the stopping condition for a relative stopping criterion, but does change it for an unscaled one.

Solvers with enhanced exit messages state, in the stopping criteria details, when they use relative first-order optimality.

For a smooth unconstrained problem,

$$\underset{x}{\mathrm{min}}f(x),$$

the first-order optimality
measure is the infinity norm (meaning maximum absolute value) of ∇*f*(*x*),
which is:

$$\text{first-orderoptimalitymeasure=}\underset{i}{\mathrm{max}}\left|{\left(\nabla f(x)\right)}_{i}\right|={\Vert \nabla f(x)\Vert}_{\infty}.$$

This measure of optimality is based on the familiar condition for a smooth function to achieve a minimum: its gradient must be zero. For unconstrained problems, when the first-order optimality measure is nearly zero, the objective function has gradient nearly zero, so the objective function could be near a minimum. If the first-order optimality measure is not small, the objective function is not minimal.

This section summarizes the theory behind the definition of first-order optimality measure for constrained problems. The definition as used in Optimization Toolbox functions is in Constrained Optimality in Solver Form.

For a smooth constrained problem, let *g* and *h* be
vector functions representing all inequality and equality constraints
respectively (meaning bound, linear, and nonlinear constraints):

$$\underset{x}{\mathrm{min}}f(x)\text{subjectto}g(x)\le 0,\text{}h(x)=0.$$

The meaning of first-order optimality in this case is more complex than for unconstrained problems. The definition is based on the Karush-Kuhn-Tucker (KKT) conditions. The KKT conditions are analogous to the condition that the gradient must be zero at a minimum, modified to take constraints into account. The difference is that the KKT conditions hold for constrained problems.

The KKT conditions use the auxiliary Lagrangian function:

$$L(x,\lambda )=f(x)+{\displaystyle \sum {\lambda}_{g,i}{g}_{i}(x)}+{\displaystyle \sum {\lambda}_{h,i}{h}_{i}(x)}.$$ | (1) |

The KKT conditions are:

$${\nabla}_{x}L(x,\lambda )=0,$$ | (2) |

$${\lambda}_{g,i}{g}_{i}(x)=0\text{}\forall i,$$ | (3) |

$$\{\begin{array}{c}g(x)\le 0,\\ h(x)=0,\\ {\lambda}_{g,i}\ge 0.\end{array}$$ | (4) |

The optimality measure associated with Equation 2 is

$$\Vert {\nabla}_{x}L(x,\lambda \Vert =\Vert \nabla f(x)+{\displaystyle \sum {\lambda}_{g,i}\nabla {g}_{i}(x)+{\displaystyle \sum {\lambda}_{h,i}\nabla {h}_{h,i}(x)}}\Vert .$$ | (5) |

$$\Vert \overrightarrow{{\lambda}_{g}g}(x)\Vert ,$$ | (6) |

The combined optimality measure is the maximum of the values
calculated in Equation 5 and Equation 6. Solvers that
accept nonlinear constraint functions report constraint violations *g*(*x*) > 0 or |*h*(*x*)| > 0 as `ConstraintTolerance`

violations.
See Tolerances and Stopping Criteria.

Most constrained toolbox solvers separate their calculation of first-order optimality measure into bounds, linear functions, and nonlinear functions. The measure is the maximum of the following two norms, which correspond to Equation 5 and Equation 6:

$$\begin{array}{l}\Vert {\nabla}_{x}L(x,\lambda \Vert =\Vert \nabla f(x)+{A}^{T}{\lambda}_{ineqlin}+Ae{q}^{T}{\lambda}_{eqlin}\\ \text{}+{\displaystyle \sum {\lambda}_{ineqnonlin,i}\nabla {c}_{i}(x)+{\displaystyle \sum {\lambda}_{eqnonlin,i}\nabla ce{q}_{i}(x)}}\Vert ,\end{array}$$ | (7) |

$$\Vert \overrightarrow{\left|{l}_{i}-{x}_{i}\right|{\lambda}_{lower,i}},\overrightarrow{\left|{x}_{i}-{u}_{i}\right|{\lambda}_{upper,i}},\overrightarrow{\left|{(Ax-b)}_{i}\right|{\lambda}_{ineqlin,i}},\overrightarrow{\left|{c}_{i}(x)\right|{\lambda}_{ineqnonlin,i}}\Vert ,$$ | (8) |

where the norm of the
vectors in Equation 7 and Equation 8 is the infinity
norm (maximum). The subscripts on the Lagrange multipliers correspond
to solver Lagrange multiplier structures. See Lagrange Multiplier Structures. The summations in Equation 7 range over all
constraints. If a bound is ±`Inf`

, that term
is not constrained, so it is not part of the summation.

For some large-scale problems with only linear equalities, the
first-order optimality measure is the infinity norm of the *projected* gradient.
In other words, the first-order optimality measure is the size of
the gradient projected onto the null space of `Aeq`

.

For least-squares solvers and trust-region-reflective algorithms,
in problems with bounds alone, the first-order optimality measure
is the maximum over *i* of |*v _{i}**

$${v}_{i}=\{\begin{array}{ll}\left|{x}_{i}-{b}_{i}\right|\hfill & \text{ifthenegativegradientpointstowardbound}{b}_{i}\hfill \\ 1\hfill & \text{otherwise}\text{.}\hfill \end{array}$$

If *x _{i}* is at a bound,