Kernel (Covariance) Function Options

In supervised learning, it is expected that the points with similar predictor values $x_{i}$ , naturally have close response (target) values $y_{i}$ . In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables $f (x_{i})$ and $f (x_{j})$ , where both $x_{i}$ and $x_{j}$ are d-by-1 vectors. In other words, it determines how the response at one point $x_{i}$ is affected by responses at other points $x_{j}$ , i ≠ j, i = 1, 2, ..., n. The covariance function $k (x_{i}, x_{j})$ can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector $θ$ . Hence, it is possible to express the covariance function as $k (x_{i}, x_{j} | θ)$ .

For many standard kernel functions, the kernel parameters are based on the signal standard deviation $σ_{f}$ and the characteristic length scale $σ_{l}$ . The characteristic length scales briefly define how far apart the input values $x_{i}$ can be for the response values to become uncorrelated. Both $σ_{l}$ and $σ_{f}$ need to be greater than 0, and this can be enforced by the unconstrained parameterization vector $θ$ , such that

$θ_{1} = \log σ_{l}, θ_{2} = \log σ_{f} .$

The built-in kernel (covariance) functions with same length scale for each predictor are:

Squared Exponential Kernel
This is one of the most commonly used covariance functions and is the default option for fitrgp. The squared exponential kernel function is defined as

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} exp [- \frac{1}{2} \frac{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})}{σ_{l}^{2}}] .$
where $σ_{l}$ is the characteristic length scale, and $σ_{f}$ is the signal standard deviation.
Exponential Kernel
You can specify the exponential kernel function using the 'KernelFunction','exponential' name-value pair argument. This covariance function is defined by

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} \exp (- \frac{r}{σ_{l}}),$
where $σ_{l}$ is the characteristic length scale and

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .
Matern 3/2
You can specify the Matern 3/2 kernel function using the 'KernelFunction','matern32' name-value pair argument. This covariance function is defined by

$\begin{array}{l} k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \frac{\sqrt{3} r}{σ_{l}}) exp (- \frac{\sqrt{3} r}{σ_{l}}) \end{array},$
where

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .
Matern 5/2
You can specify the Matern 5/2 kernel function using the 'KernelFunction','matern52' name-value pair argument. The Matern 5/2 covariance function is defined as

$\begin{array}{l} k (x_{i}, x_{j}) = σ_{f}^{2} (1 + \frac{\sqrt{5} r}{σ_{l}} + \frac{5 r^{2}}{3 σ_{l}^{2}}) exp (- \frac{\sqrt{5} r}{σ_{l}}) \end{array},$
where

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .
Rational Quadratic Kernel
You can specify the rational quadratic kernel function using the 'KernelFunction','rationalquadratic' name-value pair argument. This covariance function is defined by
$k (x_{i}, x_{j} | θ) = σ_{f}^{2} {(1 + \frac{r^{2}}{2 α σ_{l}^{2}})}^{- α},$
where $σ_{l}$ is the characteristic length scale, $α$ is a positive-valued scale-mixture parameter, and

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .

It is possible to use a separate length scale $σ_{m}^{}$ for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parameterization $θ$ in this case is

$\begin{array}{l} θ_{m} = \log σ_{m}, for m = 1, 2, ..., d \\ θ_{d + 1} = \log σ_{f} . \end{array}$

The built-in kernel (covariance) functions with separate length scale for each predictor are:

ARD Squared Exponential Kernel
You can specify this kernel function using the 'KernelFunction','ardsquaredexponential' name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} exp [- \frac{1}{2} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}] .$
ARD Exponential Kernel
You can specify this kernel function using the 'KernelFunction','ardexponential' name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as
$k (x_{i}, x_{j} | θ) = σ_{f}^{2} \exp (- r),$
where
$r = \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} .$
ARD Matern 3/2
You can specify this kernel function using the 'KernelFunction','ardmatern32' name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \sqrt{3} r) exp (- \sqrt{3} r),$
where

$r = \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} .$
ARD Matern 5/2
You can specify this kernel function using the 'KernelFunction','ardmatern52' name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as

$\begin{array}{l} k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \sqrt{5} r + \frac{5}{3} r^{2}) exp (- \sqrt{5} r) \end{array},$
where

$r = \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} .$
ARD Rational Quadratic Kernel
You can specify this kernel function using the 'KernelFunction','ardrationalquadratic' name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as
$k (x_{i}, x_{j} | θ) = σ_{f}^{2} {(1 + \frac{1}{2 α} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}})}^{- α} .$

You can specify the kernel function using the KernelFunction name-value pair argument in a call to fitrgp. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parameterization vector $θ$ . fitrgp uses analytical derivatives to estimate parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

References

[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

[2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.

Kernel (Covariance) Function Options

References

See Also

Topics