fitdist
Fit probability distribution object to data
Syntax
Description
creates
the probability distribution object with additional options specified
by one or more name-value pair arguments. For example, you can indicate
censored data or specify control parameters for the iterative fitting
algorithm.pd
= fitdist(x
,distname
,Name,Value
)
[
creates
probability distribution objects by fitting the distribution specified
by pdca
,gn
,gl
]
= fitdist(x
,distname
,'By',groupvar
)distname
to the data in x
based
on the grouping variable groupvar
. It returns
a cell array of fitted probability distribution objects, pdca
,
a cell array of group labels, gn
, and a cell
array of grouping variable levels, gl
.
Examples
Fit Normal Distribution to Data
Fit a normal distribution to sample data, and examine the fit by using a histogram and a quantile-quantile plot.
Load patient weights from the data file patients.mat
.
load patients
x = Weight;
Create a normal distribution object by fitting it to the data.
pd = fitdist(x,'Normal')
pd = NormalDistribution Normal distribution mu = 154 [148.728, 159.272] sigma = 26.5714 [23.3299, 30.8674]
The distribution object display includes the parameter estimates for the mean (mu
) and standard deviation (sigma
), and the 95% confidence intervals for the parameters.
You can use the object functions of pd
to evaluate the distribution and generate random numbers. Display the supported object functions.
methods(pd)
Methods for class prob.NormalDistribution: cdf gather icdf iqr mean median negloglik paramci pdf plot proflik random std truncate var
For example, obtain the 95% confidence intervals by using the paramci
function.
ci95 = paramci(pd)
ci95 = 2×2
148.7277 23.3299
159.2723 30.8674
Specify the significance level (Alpha
) to obtain confidence intervals with a different confidence level. Compute the 99% confidence intervals.
ci99 = paramci(pd,'Alpha',.01)
ci99 = 2×2
147.0213 22.4257
160.9787 32.4182
Evaluate and plot the pdf values of the distribution.
x_values = 50:1:250; y = pdf(pd,x_values); plot(x_values,y)
Create a histogram with the normal distribution fit by using the histfit
function. histfit
uses fitdist
to fit a distribution to data.
histfit(x)
The histogram shows that the data has two modes, and that the mode of the normal distribution fit is between those two modes.
Use qqplot
to create a quantile-quantile plot of the quantiles of the sample data x
versus the theoretical quantile values of the fitted distribution.
qqplot(x,pd)
The plot is not a straight line, suggesting that the data does not follow a normal distribution.
Fit Kernel Distribution to Data
Load patient weights from the data file patients.mat
.
load patients
x = Weight;
Create a kernel distribution object by fitting it to the data. Use the Epanechnikov kernel function.
pd = fitdist(x,'Kernel','Kernel','epanechnikov')
pd = KernelDistribution Kernel = epanechnikov Bandwidth = 14.3792 Support = unbounded
Plot the pdf of the distribution.
x_values = 50:1:250; y = pdf(pd,x_values); plot(x_values,y)
Fit Normal Distributions to Grouped Data
Load patient weights and genders from the data file patients.mat
.
load patients
x = Weight;
Create normal distribution objects by fitting them to the data, grouped by patient gender.
[pdca,gn,gl] = fitdist(x,'Normal','By',Gender)
pdca=1×2 cell array
{1x1 prob.NormalDistribution} {1x1 prob.NormalDistribution}
gn = 2x1 cell
{'Male' }
{'Female'}
gl = 2x1 cell
{'Male' }
{'Female'}
The cell array pdca
contains two probability distribution objects, one for each gender group. The cell array gn
contains two group labels. The cell array gl
contains two group levels.
View each distribution in the cell array pdca
to compare the mean, mu
, and the standard deviation, sigma
, grouped by patient gender.
female = pdca{1} % Distribution for females
female = NormalDistribution Normal distribution mu = 180.532 [177.833, 183.231] sigma = 9.19322 [7.63933, 11.5466]
male = pdca{2} % Distribution for males
male = NormalDistribution Normal distribution mu = 130.472 [128.183, 132.76] sigma = 8.30339 [6.96947, 10.2736]
Compute the pdf of each distribution.
x_values = 50:1:250; femalepdf = pdf(female,x_values); malepdf = pdf(male,x_values);
Plot the pdfs for a visual comparison of weight distribution by gender.
figure plot(x_values,femalepdf,'LineWidth',2) hold on plot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2) legend(gn,'Location','NorthEast') hold off
Fit Kernel Distributions to Grouped Data
Load patient weights and genders from the data file patients.mat
.
load patients
x = Weight;
Create kernel distribution objects by fitting them to the data, grouped by patient gender. Use a triangular kernel function.
[pdca,gn,gl] = fitdist(x,'Kernel','By',Gender,'Kernel','triangle');
View each distribution in the cell array pdca
to see the kernel distributions for each gender.
female = pdca{1} % Distribution for females
female = KernelDistribution Kernel = triangle Bandwidth = 5.08961 Support = unbounded
male = pdca{2} % Distribution for males
male = KernelDistribution Kernel = triangle Bandwidth = 4.25894 Support = unbounded
Compute the pdf of each distribution.
x_values = 50:1:250; femalepdf = pdf(female,x_values); malepdf = pdf(male,x_values);
Plot the pdfs for a visual comparison of weight distribution by gender.
figure plot(x_values,femalepdf,'LineWidth',2) hold on plot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2) legend(gn,'Location','NorthEast') hold off
Input Arguments
x
— Input data
column vector
Input data, specified as a column vector. fitdist
ignores
NaN
values in x
. Additionally,
any NaN
values in the censoring vector or frequency
vector cause fitdist
to ignore the corresponding values
in x
.
Data Types: double
distname
— Distribution name
character vector | string scalar
Distribution name, specified as one of the following character vectors or string scalars. The
distribution specified by distname
determines the type of
the returned probability distribution object.
Distribution Name | Description | Distribution Object |
---|---|---|
'Beta' | Beta distribution | BetaDistribution |
'Binomial' | Binomial distribution | BinomialDistribution |
'BirnbaumSaunders' | Birnbaum-Saunders distribution | BirnbaumSaundersDistribution |
'Burr' | Burr distribution | BurrDistribution |
'Exponential' | Exponential distribution | ExponentialDistribution |
'Extreme Value' or
'ev' | Extreme Value distribution | ExtremeValueDistribution |
'Gamma' | Gamma distribution | GammaDistribution |
'Generalized Extreme Value' or
'gev' | Generalized Extreme Value distribution | GeneralizedExtremeValueDistribution |
'Generalized Pareto' or
'gp' | Generalized Pareto distribution | GeneralizedParetoDistribution |
'Half Normal' or
'hn' | Half-normal distribution | HalfNormalDistribution |
'InverseGaussian' | Inverse Gaussian distribution | InverseGaussianDistribution |
'Kernel' | Kernel distribution | KernelDistribution |
'Logistic' | Logistic distribution | LogisticDistribution |
'Loglogistic' | Loglogistic distribution | LoglogisticDistribution |
'Lognormal' | Lognormal distribution | LognormalDistribution |
'Nakagami' | Nakagami distribution | NakagamiDistribution |
'Negative Binomial' or
'nbin' | Negative Binomial distribution | NegativeBinomialDistribution |
'Normal' | Normal distribution | NormalDistribution |
'Poisson' | Poisson distribution | PoissonDistribution |
'Rayleigh' | Rayleigh distribution | RayleighDistribution |
'Rician' | Rician distribution | RicianDistribution |
'Stable' | Stable distribution | StableDistribution |
'tLocationScale' | t Location-Scale distribution | tLocationScaleDistribution |
'Weibull' or
'wbl' | Weibull distribution | WeibullDistribution |
groupvar
— Grouping variable
categorical array | logical or numeric vector | character array | string array | cell array of character vectors
Grouping variable, specified as a categorical array, logical or numeric vector, character array, string array, or cell array of character vectors. Each unique value in a grouping variable defines a group.
For example, if Gender
is a cell array of
character vectors with values 'Male'
and 'Female'
,
you can use Gender
as a grouping variable to fit
a distribution to your data by gender.
More than one grouping variable can be used by specifying a cell array of grouping variables. Observations are placed in the same group if they have common values of all specified grouping variables.
For example, if Smoker
is a logical vector
with values 0
for nonsmokers and 1
for
smokers, then specifying the cell array {Gender,Smoker}
divides
observations into four groups: Male Smoker, Male Nonsmoker, Female
Smoker, and Female Nonsmoker.
Example: {Gender,Smoker}
Data Types: categorical
| logical
| single
| double
| char
| string
| cell
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: fitdist(x,'Kernel','Kernel','triangle')
fits
a kernel distribution object to the data in x
using
a triangular kernel function.
Censoring
— Logical flag for censored data
0
(default) | vector of logical values
Logical flag for censored data, specified as a vector of logical values that is the same size
as input vector x
. The value is
1
when the corresponding element in
x
is a right-censored observation and
0
when the corresponding element is an exact
observation. The default is a vector of 0
s,
indicating that all observations are exact.
fitdist
ignores any NaN
values in this censoring vector.
Additionally, any NaN
values in
x
or the frequency vector cause
fitdist
to ignore the corresponding values in the
censoring vector.
This argument is valid only if distname
is
'BirnbaumSaunders'
, 'Burr'
,
'Exponential'
, 'ExtremeValue'
,
'Gamma'
, 'InverseGaussian'
,
'Kernel'
, 'Logistic'
,
'Loglogistic'
, 'Lognormal'
,
'Nakagami'
, 'Normal'
,
'Rician'
, 'tLocationScale'
, or
'Weibull'
.
Data Types: logical
Frequency
— Observation frequency
1
(default) | vector of nonnegative integer values
Observation frequency, specified as a vector of nonnegative integer values that is the same
size as input vector x
. Each element of the
frequency vector specifies the frequencies for the corresponding
elements in x
. The default is a vector of
1
s, indicating that each value in
x
only appears once.
fitdist
ignores any NaN
values in this frequency vector.
Additionally, any NaN
values in
x
or the censoring vector cause
fitdist
to ignore the corresponding values in the
frequency vector.
Data Types: single
| double
Options
— Control parameters
structure
Control parameters for the iterative fitting algorithm, specified as a structure you create
using statset
.
Data Types: struct
NTrials
— Number of trials for binomial distribution
1 (default) | positive integer value
Number of trials for the binomial distribution, specified as a positive integer value.
This argument is valid only when distname
is
'Binomial'
(binomial distribution).
Example: 'Ntrials',10
Data Types: single
| double
theta
— Location (threshold) parameter for generalized Pareto distribution
scalar value
Location (threshold) parameter for the generalized Pareto distribution, specified as a scalar.
This argument is valid only when distname
is
'Generalized Pareto'
(generalized Pareto
distribution).
The default value is 0 when the sample data x
includes only nonnegative values. You must specify
theta
if x
includes
negative values.
Example: 'theta',1
Data Types: single
| double
mu
— Location parameter for half-normal distribution
scalar value
Location parameter for the half-normal distribution, specified as a scalar.
This argument is valid only when distname
is
'Half Normal'
(half-normal distribution).
The default value is 0 when the sample data x
includes only nonnegative values. You must specify
mu
if x
includes negative
values.
Example: 'mu',1
Data Types: single
| double
Kernel
— Kernel smoother type for kernel distribution
'normal'
(default) | 'box'
| 'triangle'
| 'epanechnikov'
Kernel smoother type for the kernel distribution, specified as one of the following:
'normal'
'box'
'triangle'
'epanechnikov'
You must specify distname
as
'Kernel'
to use this option.
Support
— Kernel density support for kernel distribution
'unbounded'
(default) | 'positive'
| two-element vector
Kernel density support for the kernel distribution, specified as
'unbounded'
, 'positive'
, or a
two-element vector.
Value | Description |
---|---|
'unbounded' | Density can extend over the whole real line. |
'positive' | Density is restricted to positive values. |
Alternatively, you can specify a two-element vector giving finite lower and upper limits for the support of the density.
You must specify distname
as
'Kernel'
to use this option.
Data Types: single
| double
| char
| string
Width
— Bandwidth of kernel smoothing window for kernel distribution
scalar value
Bandwidth of the kernel smoothing window for the kernel distribution,
specified as a scalar value. The default value used by
fitdist
is optimal for estimating normal
densities, but you might want to choose a smaller value to reveal
features such as multiple modes. You must specify
distname
as 'Kernel'
to use
this option.
Data Types: single
| double
Output Arguments
pd
— Probability distribution
probability distribution object
Probability distribution, returned as a probability distribution object. The distribution
specified by distname
determines the class type of the
returned probability distribution object. For the list of
distname
values and corresponding probability
distribution objects, see distname
.
pdca
— Probability distribution objects
cell array
Probability distribution objects of the type specified by distname
,
returned as a cell array. For the list of distname
values
and corresponding probability distribution objects, see
distname
.
gn
— Group labels
cell array of character vectors
Group labels, returned as a cell array of character vectors.
gl
— Grouping variable levels
cell array of character vectors
Grouping variable levels, returned as a cell array of character vectors containing one column for each grouping variable.
Algorithms
The fitdist
function fits most distributions
using maximum likelihood estimation. Two exceptions are the normal
and lognormal distributions with uncensored data.
For the uncensored normal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance.
For the uncensored lognormal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance of the log of the data.
Alternative Functionality
The Distribution Fitter app opens a graphical user interface for you to import data from the workspace and interactively fit a probability distribution to that data. You can then save the distribution to the workspace as a probability distribution object. Open the Distribution Fitter app using
distributionFitter
, or click Distribution Fitter on the Apps tab.To fit a distribution to left-censored, double-censored, or interval-censored data, use
mle
. You can find the maximum likelihood estimates by using themle
function, and create a probability distribution object by using themakedist
function. For an example, see Find MLEs for Double-Censored Data.
References
[1] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 1, Hoboken, NJ: Wiley-Interscience, 1993.
[2] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 2, Hoboken, NJ: Wiley-Interscience, 1994.
[3] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press, 1997.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
Supported syntaxes are:
Code generation does not support the syntaxes that include the grouping variablepd
= fitdist(x
,distname
)pd
= fitdist(x
,distname
,Name,Value
)'By',groupvar
and the related output argumentspdca
,gn
, andgl
.fitdist
supports code generation for beta, exponential, extreme value, lognormal, normal, and Weibull distributions.The value of
distname
can be'Beta'
,'Exponential'
,'ExtremeValue'
,'Lognormal'
,'Normal'
or'Weibull'
.The value of
distname
must be a compile-time constant.
The values of
x
,'Censoring'
, and'Frequency'
must not containNaN
values.Code generation ignores the
'Frequency'
value for the beta distribution. Instead of specifying the'Frequency'
value, manually add duplicated values tox
so that the values inx
have the frequency you want.Code generation does not support these input arguments:
groupvar
,NTrials
,Theta
,mu
,Kernel
,Support
, andWidth
.Names in name-value pair arguments must be compile-time constants.
These object functions of
pd
support code generation:cdf
,icdf
,iqr
,mean
,median
,pdf
,std
,truncate
, andvar
.
For more information on code generation, see Introduction to Code Generation and Code Generation for Probability Distribution Objects.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
You cannot specify the input argument
distname
as'Rician'
or'Stable'
.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2009a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)