Documentation

## Kernel (Covariance) Function Options

In supervised learning, it is expected that the points with similar predictor values ${x}_{i}$, naturally have close response (target) values ${y}_{i}$. In Gaussian processes, the covariance function expresses this similarity . It specifies the covariance between the two latent variables $f\left({x}_{i}\right)$ and $f\left({x}_{j}\right)$, where both ${x}_{i}$ and ${x}_{j}$ are d-by-1 vectors. In other words, it determines how the response at one point ${x}_{i}$ is affected by responses at other points ${x}_{j}$, ij, i = 1, 2, ..., n. The covariance function $k\left({x}_{i},{x}_{j}\right)$ can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector $\theta$. Hence, it is possible to express the covariance function as $k\left({x}_{i},{x}_{j}|\theta \right)$.

For many standard kernel functions, the kernel parameters are based on the signal standard deviation ${\sigma }_{f}$ and the characteristic length scale ${\sigma }_{l}$. The characteristic length scales briefly define how far apart the input values ${x}_{i}$ can be for the response values to become uncorrelated. Both ${\sigma }_{l}$ and ${\sigma }_{f}$ need to be greater than 0, and this can be enforced by the unconstrained parametrization vector $\theta$, such that

`${\theta }_{1}=\mathrm{log}{\sigma }_{l},\text{ }{\theta }_{2}=\mathrm{log}{\sigma }_{f}.$`

The built-in kernel (covariance) functions with same length scale for each predictor are:

• Squared Exponential Kernel

This is one of the most commonly used covariance functions and is the default option for `fitrgp`. The squared exponential kernel function is defined as

where ${\sigma }_{l}$ is the characteristic length scale, and ${\sigma }_{f}$ is the signal standard deviation.

• Exponential Kernel

You can specify the exponential kernel function using the `'KernelFunction','exponential'` name-value pair argument. This covariance function is defined by

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\mathrm{exp}\left(-\frac{r}{{\sigma }_{l}}\right),$`

where ${\sigma }_{l}$ is the characteristic length scale and

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

• Matern 3/2

You can specify the Matern 3/2 kernel function using the `'KernelFunction','matern32'` name-value pair argument. This covariance function is defined by

`$\begin{array}{l}k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\frac{\sqrt{3}r}{{\sigma }_{l}}\right)\text{exp}\left(-\frac{\sqrt{3}r}{{\sigma }_{l}}\right)\hfill \end{array},$`

where

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

• Matern 5/2

You can specify the Matern 5/2 kernel function using the `'KernelFunction','matern52'` name-value pair argument. The Matern 5/2 covariance function is defined as

`$\begin{array}{l}k\left({x}_{i},{x}_{j}\right)={\sigma }_{f}^{2}\left(1+\frac{\sqrt{5}r}{{\sigma }_{l}}+\frac{5{r}^{2}}{3{\sigma }_{l}^{2}}\right)\text{exp}\left(-\frac{\sqrt{5}r}{{\sigma }_{l}}\right)\hfill \end{array},$`

where

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

You can specify the rational quadratic kernel function using the `'KernelFunction','rationalquadratic'` name-value pair argument. This covariance function is defined by

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}{\left(1+\frac{{r}^{2}}{2\alpha {\sigma }_{l}^{2}}\right)}^{-\alpha },$`

where ${\sigma }_{l}$ is the characteristic length scale, $\alpha$ is a positive-valued scale-mixture parameter, and

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

It is possible to use a separate length scale ${\sigma }_{m}^{}$ for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) . The unconstrained parametrization $\theta$ in this case is

`$\begin{array}{l}{\theta }_{m}=\mathrm{log}{\sigma }_{m},\text{ }\text{for}\text{\hspace{0.17em}}m=1,2,...,d\text{ }\\ {\theta }_{d+1}=\mathrm{log}{\sigma }_{f}.\end{array}$`

The built-in kernel (covariance) functions with separate length scale for each predictor are:

• ARD Squared Exponential Kernel

You can specify this kernel function using the `'KernelFunction','ardsquaredexponential'` name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\text{exp}\left[-\frac{1}{2}\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}\right].$`

• ARD Exponential Kernel

You can specify this kernel function using the `'KernelFunction','ardexponential'` name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\mathrm{exp}\left(-r\right),$`

where

`$r=\sqrt{\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}}.$`
• ARD Matern 3/2

You can specify this kernel function using the `'KernelFunction','ardmatern32'` name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\sqrt{3}\text{\hspace{0.17em}}r\right)\text{exp}\left(-\sqrt{3}\text{\hspace{0.17em}}r\right),$`

where

`$r=\sqrt{\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}}.$`

• ARD Matern 5/2

You can specify this kernel function using the `'KernelFunction','ardmatern52'` name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as

`$\begin{array}{l}k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\sqrt{5}\text{\hspace{0.17em}}r+\frac{5}{3}\text{\hspace{0.17em}}{r}^{2}\right)\text{exp}\left(-\sqrt{5}\text{\hspace{0.17em}}r\right)\hfill \end{array},$`

where

`$r=\sqrt{\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}}.$`

You can specify this kernel function using the `'KernelFunction','ardrationalquadratic'` name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}{\left(1+\frac{1}{2\alpha }\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}\right)}^{-\alpha }.$`

You can specify the kernel function using the `KernelFunction` name-value pair argument in a call to `fitrgp`. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parametrization vector $\theta$. `fitrgp` uses analytical derivatives to estimate parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

 Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

 Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.