# Documentation

### This is machine translation

Translated by
Mouse over text to see original. Click the button below to return to the English verison of the page.

# ksdensity

Kernel smoothing function estimate for univariate and bivariate data

## Syntax

• ``````[f,xi] = ksdensity(x)``````
example
• ``````[f,xi] = ksdensity(x,pts)``````
example
• ``````[f,xi] = ksdensity(x,pts,Name,Value)``````
example
• ``````[f,xi,bw] = ksdensity(___)``````
example
• ``ksdensity(___)``
example
• ``ksdensity(ax,___)``

## Description

example

``````[f,xi] = ksdensity(x)``` returns a probability density estimate, `f`, for the sample data in the vector or two-column matrix `x`. The estimate is based on a normal kernel function, and is evaluated at equally-spaced points, `xi`, that cover the range of the data in `x`. `ksdensity` estimates the density at 100 points for univariate data, or 900 points for bivariate data.`ksdensity` works best with continuously distributed samples.```

example

``````[f,xi] = ksdensity(x,pts)``` returns a probability density estimate, `f`, for the data sample in the vector or two-column matrix `x`, evaluated at the specified values in vector `pts`. Here, `xi` and `pts` contain identical values.```

example

``````[f,xi] = ksdensity(x,pts,Name,Value)``` returns a probability density estimate, `f`, for the sample in the vector or two-column matrix `x`, with additional options specified by one or more `Name,Value` pair arguments.For example, you can define the function type `ksdensity` evaluates, such as probability density, cumulative probability, survivor function, and so on. Or you can specify the bandwidth of the smoothing window.```

example

``````[f,xi,bw] = ksdensity(___)``` also returns the bandwidth of the kernel smoothing window, `bw`. The default bandwidth is the optimal for normal densities.```

example

````ksdensity(___)` plots the kernel smoothing function estimate.```
````ksdensity(ax,___)` plots the results using axes with the handle, `ax`, instead of the current axes returned by `gca`.```

## Examples

collapse all

Generate a sample data set from a mixture of two normal distributions.

```rng default % for reproducibility x = [randn(30,1); 5+randn(30,1)]; ```

Plot the estimated density.

```[f,xi] = ksdensity(x); figure plot(xi,f); ```

The density estimate shows the bimodality of the sample.

Load the sample data.

```load hospital ```

Compute and plot the estimated cdf evaluated at a specified set of values.

```pts = (min(hospital.Weight):2:max(hospital.Weight)); figure() ecdf(hospital.Weight) hold on [f,xi,bw] = ksdensity(hospital.Weight,pts,'support','positive',... 'function','cdf'); plot(xi,f,'-g','LineWidth',2) legend('empirical cdf','kernel-bw:default','Location','NorthWest') xlabel('Patient weights') ylabel('Estimated cdf') ```

`ksdensity` seems to smooth the cumulative distribution function estimate too much. An estimate with a smaller bandwidth might produce a closer estimate to the empirical cumulative distribution function.

Return the bandwidth of the smoothing window.

```bw ```
```bw = 0.1070 ```

Plot the cumulative distribution function estimate using a smaller bandwidth.

```[f,xi] = ksdensity(hospital.Weight,pts,'support','positive',... 'function','cdf','bandwidth',0.05); plot(xi,f,'--r','LineWidth',2) legend('empirical cdf','kernel-bw:default','kernel-bw:0.05',... 'Location','NorthWest') hold off ```

The `ksdensity` estimate with a smaller bandwidth matches the empirical cumulative distribution function better.

Load the sample data.

```load hospital ```

Plot the estimated cdf evaluated at 50 equally spaced points.

```figure() ksdensity(hospital.Weight,'support','positive','function','cdf',... 'npoints',50) xlabel('Patient weights') ylabel('Estimated cdf') ```

Generate sample data from an exponential distribution with mean 3.

```rng default % for reproducibility x = random('exp',3,100,1); ```

Create a logical vector that indicates censoring. Here, observations with lifetimes longer than 10 are censored.

```T = 10; cens = (x>10); ```

Compute and plot the estimated density function.

```figure ksdensity(x,'support','positive','censoring',cens); ```

Compute and plot the survivor function.

```figure ksdensity(x,'support','positive','censoring',cens,... 'function','survivor'); ```

Compute and plot the cumulative hazard function.

```figure ksdensity(x,'support','positive','censoring',cens,... 'function','cumhazard'); ```

Generate a mixture of two normal distributions, and plot the estimated inverse cumulative distribution function at a specified set of probability values.

```rng default % for reproducibility x = [randn(30,1); 5+randn(30,1)]; pi = linspace(.01,.99,99); figure ksdensity(x,pi,'function','icdf'); ```

Generate a mixture of two normal distributions.

```rng default % For reproducibility x = [randn(30,1); 5+randn(30,1)]; ```

Return the bandwidth of the smoothing window for the probability density estimate.

```[f,xi,bw] = ksdensity(x); bw ```
```bw = 1.5141 ```

The default bandwidth is optimal for normal densities.

Plot the estimated density.

```figure plot(xi,f); xlabel('xi') ylabel('f') hold on ```

Plot the density using an increased bandwidth value.

```[f,xi] = ksdensity(x,'width',1.8); plot(xi,f,'--r','LineWidth',1.5) ```

A higher bandwidth further smooths the density estimate, which might mask some characteristics of the distribution.

Now, plot the density using a decreased bandwidth value.

```[f,xi] = ksdensity(x,'width',0.8); plot(xi,f,'-.k','LineWidth',1.5) legend('bw = default','bw = 1.8','bw = 0.8') hold off ```

A smaller bandwidth smooths the density estimate less, which exaggerates some characteristics of the sample.

Create a two-column vector of points at which to evaluate the density.

```gridx1 = -0.25:.05:1.25; gridx2 = 0:.1:15; [x1,x2] = meshgrid(gridx1, gridx2); x1 = x1(:); x2 = x2(:); xi = [x1 x2]; ```

Generate a 30-by-2 matrix containing random numbers from a mixture of bivariate normal distributions.

```rng default % For reproducibility x = [0+.5*rand(20,1) 5+2.5*rand(20,1); .75+.25*rand(10,1) 8.75+1.25*rand(10,1)]; ```

Plot the estimated density of the sample data.

```figure ksdensity(x,xi); ```

## Input Arguments

collapse all

Sample data for which `ksdensity` returns `f` values, specified as a column vector or two-column matrix. Use a column vector for univariate data, and a two-column matrix for bivariate data.

Example: `[f,xi] = ksdensity(x)`

Data Types: `single` | `double`

Points to evaluate `f` at, specified as a vector or two-column matrix. For univariate data, `pts` can be a row or column vector. The returned output `f` has the same dimensions as `pts`.

Example: `pts = (0:1:25); ksdensity(x,pts);`

Data Types: `single` | `double`

Axes handle for the figure `ksdensity` plots to, specified as a handle.

For example, if `h` is a handle for a figure, then `ksdensity` can plot to that figure as follows.

Example: `ksdensity(h,x)`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'censoring',cens,'kernel','triangle','npoints',20,'function','cdf'` specifies that `ksdensity` estimates the cdf by evaluating at 20 equally spaced points that covers the range of data, using the triangle kernel smoothing function and accounting for the censored data information in vector `cens`.

collapse all

The bandwidth of the kernel-smoothing window, which is a function of the number of points in `x`, specified as the comma-separated pair consisting of `'Bandwidth'` and a scalar value. If the sample data is bivariate, `Bandwidth` can also be a two-element vector. The default is optimal for estimating normal densities, but you might want to choose a larger or smaller value to smooth more or less.

Example: `'Bandwidth',0.8`

Data Types: `single` | `double`

Logical vector indicating which entries are censored, specified as the comma-separated pair consisting of `'Censoring'` and a vector of binary values. A value of 0 indicates there is no censoring, 1 indicates that observation is censored. Default is there is no censoring. This name-value pair is only valid for univariate data.

Example: `'Censoring',censdata`

Data Types: `logical`

Function to estimate, specified as the comma-separated pair consisting of `'Function'` and one of the following.

 `'pdf'` Probability density function. `'cdf'` Cumulative distribution function. `'icdf'` Inverse cumulative distribution function. For `'icdf'`, ```f = ksdensity(x,pi,'function','icdf')``` computes the estimated inverse cdf of the values in `x`, and evaluates it at the probability values specified in `pi`. `'survivor'` Survivor function. `'cumhazard'` Cumulative hazard function.

Example: `'Function'`,`'icdf'`

Data Types: `char`

Type of kernel smoother, specified as the comma-separated pair consisting of `'Kernel'` and one of the following.

• `'normal'` (default)

• `'box'`

• `'triangle'`

• `'epanechnikov'`

• You can also specify a custom kernel function, as a function handle or as a character vector, e.g., `@normpdf` or `'normpdf'`. This calls the function with one argument that is an array of distances between data values and locations where the density is evaluated. The function must return an array of the same size containing corresponding values of the kernel function.

When `'Function'` is `'pdf'`, this kernel function returns density values. Otherwise, it returns cumulative probability values.

Specifying a custom kernel when `'Function'` is `'icdf'` returns an error.

For bivariate data, `ksdensity` applies the same kernel to each dimension.

Example: `'Kernel','box'`

Data Types: `char` | `function_handle`

Number of equally spaced points in `xi`, specified as the comma-separated pair consisting of `'NumPoints'` and a scalar value. This name-value pair is only valid for univariate data.

For example, for a kernel smooth estimate of a specified function at 80 equally spaced points within the range of sample data, input:

Example: `'NumPoints',80`

Data Types: `single` | `double`

Support for the density, specified as the comma-separated pair consisting of `'support'` and one of the following.

 `'unbounded'` Default. Allow the density to extend over the whole real line. `'positive'` Restrict the density to positive values. Two-element vector, `[L U]` Give the finite lower and upper bounds for the support of the density. This option is only valid for univariate sample data. Two-by-two matrix, `[L1 L2 ; U1 U2]` Give the finite lower and upper bounds for the support of the density. The first row contains the lower limits and the second row contains the upper limits. This option is only valid for bivariate sample data.

For univariate data, if `'Support'` is `'positive'`, then `ksdensity` transforms `x` using a log function, estimates the density of the transformed values, and transforms back to the original scale. If `'Support'` is a vector `[L U]`, then `ksdensity` uses the transformation `log((X-L)/(U-X))`. The `Bandwidth` parameter and `bw` outputs are on the scale of the transformed values.

For bivariate data, `'Support'` can be a combination of positive, unbounded, or bounded variables specified as ```[0 -Inf ; Inf Inf]``` or `[0 L ; Inf U]`. `ksdensity` transforms each dimension of `x` in the same way as the univariate data. The `'Bandwidth'` parameter and `bw` outputs are on the scale of the transformed values.

Example: `'Support','positive'`

Example: `'Support',[0 10]`

Data Types: `single` | `double` | `char`

Function used to create kernel density plot, specified as the comma-separated pair consisting of `'PlotFcn'` and one of the following.

NameValue
`'surf'`3-D shaded surface plot, created using `surf`
`'contour'`Contour plot, created using `contour`
`'plot3'`3-D line plot, created using `plot3`
`'surfc'`Contour plot under a 3-D shaded surface plot, created using `surfc`

This name-value pair is only valid for bivariate sample data.

Example: `'PlotFcn','contour'`

Weights for each `x` value, specified as the comma-separated pair consisting of `'Weights'` and a vector of the same length as `x`.

For instance, if the weights for the data values are in vector `xw`, then you can specify the weights as follows.

Example: `'Weights',xw`

Data Types: `single` | `double`

## Output Arguments

collapse all

Estimated function values, returned as a vector of the same dimension as `xi` or `pts`.

Evaluation points at which `ksdensity` calculates `f`, returned as a vector. For univariate data, the default is 100 equally-spaced points that cover the range of data in `x`. For bivariate data, the default is 900 equally-spaced points created using `meshgrid` from 30 equally-spaced points in each dimension.

Bandwidth of smoothing window, returned as a scalar value.

collapse all

### Tall Array Support

This function supports tall arrays for out-of-memory data with some limitations.

• Some options that require extra passes or sorting of the input data are not supported:

• `'Censoring'`

• `'Support'` (support is always unbounded).

• Uses standard deviation (instead of median absolute deviation) to compute the bandwidth.

## References

[1] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press Inc., 1997.