fitcox
Description
The fitcox
function creates a Cox proportional hazards model
for lifetime data. The basic Cox model includes a hazard function
h_{0}(t) and model coefficients
b such that, for predictor X
, the hazard rate at time
t is
$$h\left({X}_{i},t\right)={h}_{0}(t)\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{p}{x}_{ij}{b}_{j}}\right],$$
where the b coefficients do not depend on time.
fitcox
infers both the model coefficients b and the
hazard rate h_{0}(t), and stores
them as properties in the resulting CoxModel
object.
The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See Extension of Cox Proportional Hazards Model.
coxMdl = fitcox(
modifies the fit using one or more X
,T
,Name,Value
)Name,Value
arguments. For example,
when the data includes censoring (values that are not observed), the
Censoring
argument specifies the censored data.
Examples
Estimate Cox Proportional Hazard Regression
Weibull random variables with the same shape parameter have proportional hazard rates; see Weibull Distribution. The hazard rate with scale parameter $$a$$ and shape parameter $$b$$ at time $$t$$ is
$$\frac{b}{{a}^{b}}{t}^{b1}$$.
Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter B
.
rng default % For reproducibility B = 2; A = ones(100,1); data1 = wblrnd(A,B); A2 = 5*A; data2 = wblrnd(A2,B); A3 = A/3; data3 = wblrnd(A3,B);
Create a table of data. The predictors are the three variable types, 1, 2, or 3.
predictors = categorical([A;2*A;3*A]); data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);
Fit a Cox regression to the data.
mdl = fitcox(data,"Times")
mdl = Cox Proportional Hazards regression model Beta SE zStat pValue _______ _______ _______ __________ Predictors_2 3.5834 0.33187 10.798 3.5299e27 Predictors_3 2.1668 0.20802 10.416 2.0899e25 Loglikelihood: 1197.917
rates = exp(mdl.Coefficients.Beta)
rates = 2×1
0.0278
8.7301
Fit Cox Proportional Hazards Model to Lifetime Data
Perform a Cox proportional hazards regression on the lightbulb
data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored.
Load the lightbulb
data set.
load lightbulb
Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb.
coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ... 'Censoring',lightbulb(:,3))
coxMdl = Cox Proportional Hazards regression model Beta SE zStat pValue ______ ______ ______ __________ X1 4.7262 1.0372 4.5568 5.1936e06 Loglikelihood: 212.638
Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating $$\mathrm{exp}(Beta)$$.
hr = exp(coxMdl.Coefficients.Beta)
hr = 112.8646
The estimate of the hazard ratio is $${e}^{Beta}$$ = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of coxMdl.Coefficients.pValue
indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean Beta
= 0.
Input Arguments
X
— Predictor values
matrix  table
Predictor values, specified as a matrix or table.
A matrix contains one column for each predictor and one row for each observation.
A table contains one row for each observation. A table can also contain the time data as well as the predictors.
By default, if the predictor data is in a table, fitcox
assumes
that a variable is categorical if it is a logical vector, categorical vector, character
array, string array, or cell array of character vectors. If the predictor data is a
matrix, fitcox
assumes that all predictors are continuous. To
identify any other predictors as categorical predictors, specify them by using the
CategoricalPredictors
namevalue argument.
If X
, T
, the value of
'Frequency'
, or the value of 'Stratification'
contains
NaN
values, then fitcox
removes rows with
NaN
values from all data when fitting a Cox model.
Data Types: double
 table
 categorical
T
— Event times
real column vector  real matrix with two columns  name of column in table X
 formula in Wilkinson notation for table X
Event times, specified as one of the following:
Real column vector.
Real matrix with two columns representing the start and stop times.
Name of a column in the table
X
.Formula in Wilkinson notation for the table
X
. For example, to specify that the table columns'x'
and'y'
are in the model, use'T ~ x + y'
See Wilkinson Notation.
For vector or matrix entries, the number of rows of T
must be the
same as the number of rows of X
.
Use the twocolumn form of T
to fit a model with timevarying
coefficients. See Cox Proportional Hazards Model with TimeDependent Covariates.
Data Types: single
 double
 char
 string
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: To fit data with censored values cens
, specify
'Censoring',cens
.
Baseline
— X
values at which to compute baseline hazard
mean(X)
, the default for continuous predictors  0
, the default for categorical predictors  real scalar  real row vector
X
values at which to compute the baseline hazard, specified
as a real scalar or row vector. If Baseline
is a row vector, its
length is the number of predictors, so there is one baseline for each
predictor.
The default baseline for continuous predictors is mean(X)
, so
the default hazard rate at X
for these predictors is
h(t)*exp((X – mean(X))*b)
. The default baseline for categorical
predictors is 0
. Enter 0
to compute the baseline
for all predictors relative to 0, so the hazard rate at X
is
h(t)*exp(X*b)
. Changing the baseline changes the hazard ratio,
but does not affect the coefficient estimates.
For the identified categorical predictors, fitcox
creates dummy variables. fitcox
creates one less dummy
variable than the number of categories. For details, see Automatic Creation of Dummy Variables.
Example: 'Baseline',0
Data Types: double
Beta
— Coefficient initial values
0.01/std(X)
(default)  numeric vector
Coefficient initial values, specified as a numeric vector of coefficient values.
These values initiate the likelihood maximization iterations performed by
fitcox
.
Data Types: double
CategoricalPredictors
— Categorical predictors list
vector of positive integers  logical vector  character matrix  string array  cell array of character vectors  'all'
Categorical predictors list, specified as one of the values in this table.
Value  Description 

Vector of positive integers  Each entry in the vector is an index value corresponding to the column
of the predictor data (X ) that contains a categorical
variable. 
Logical vector  A true entry means that the corresponding column of
predictor data (X ) is a categorical variable. 
Character matrix  Each row of the matrix is the name of a predictor variable in the table
X . The names must match the entries in
PredictorNames . Pad the names with extra blanks so
each row of the character matrix has the same length. 
String array or cell array of character vectors  Each element in the array is the name of a predictor variable in the
table X . The names must match the entries in
PredictorNames . 
'all'  All predictors are categorical. 
By default, if the predictor data is in a table,
fitcox
assumes that a variable is categorical if it is a
logical vector, categorical vector, character array, string array, or cell array of
character vectors. If the predictor data is a matrix, fitcox
assumes that all predictors are continuous. To identify any other predictors as
categorical predictors, specify them by using the
'CategoricalPredictors'
namevalue argument.
For the identified categorical predictors, fitcox
creates dummy variables. fitcox
creates one less dummy
variable than the number of categories. For details, see Automatic Creation of Dummy Variables.
Example: 'CategoricalPredictors','all'
Data Types: single
 double
 logical
 char
 string
 cell
Censoring
— Indicator for censoring
array of 0s (default)  array of 0s and 1s  name of a column in table X
Indicator for censoring, specified as a Boolean vector with the same number of
rows as X
or the name of a column in the table
X
. Use 1 for observations that are right censored and 0 for
observations that are fully observed. By default, all observations are fully observed.
For an example, see Cox Proportional Hazards Model for Censored Data.
Example: 'Censoring',cens
Data Types: logical
Frequency
— Frequency or weights of observations
array of 1s (default)  vector of nonnegative scalar values
Frequency or weights of observations, specified as an array the same size as
T
containing nonnegative scalar values. The array can contain
integer values corresponding to frequencies of observations or nonnegative values
corresponding to observation weights.
The default is 1 per row of X
and
T
.
If X
, T
, the value of
'Frequency'
, or the value of 'Stratification'
contains
NaN
values, then fitcox
removes rows with
NaN
values from all data when fitting a Cox model.
Example: 'Frequency',w
Data Types: double
OptimizationOptions
— Algorithm control parameters
structure
Algorithm control parameters for the iterative algorithm
fitcox
uses to estimate the solution, specified as a
structure. Create this structure using statset
. For parameter names
and default values, see the following table or enter
statset('fitcox')
.
In the table, "termination tolerance" means that if the internal iterations cause a change in the stated value less than the tolerance, the iterations stop.
Field in Structure  Description  Values 

Display  Amount of information returned to the command line 

MaxFunEvals  Maximum number of function evaluations  Positive integer; default is 200 
MaxIter  Maximum number of iterations  Positive integer; default is 100 
TolFun  Termination tolerance on change in likelihood; see Cox Proportional Hazards Model  Positive scalar; default is 1e8 
TolX  Termination tolerance for parameter (predictor estimate) change  Positive scalar; default is 1e8 
Example: 'OptimizationOptions',statset('TolX',1e6,'MaxIter',200)
PredictorNames
— Predictor variable names
string array of unique names  cell array of unique character vectors
Predictor variable names, specified as a string array of unique names or cell
array of unique character vectors. The functionality of
'PredictorNames'
depends on how you supply the training
data.
If you supply
X
as a numeric array, then you can use'PredictorNames'
to assign names to the predictor variables inX
.The order of the names in
PredictorNames
must correspond to the column order ofX
. That is,PredictorNames{1}
is the name ofX(:,1)
,PredictorNames{2}
is the name ofX(:,2)
, and so on. Also,size(X,2)
andnumel(PredictorNames)
must be equal.By default,
PredictorNames
is{'X1','X2',...}
.
If you supply
X
as a table, then you can use'PredictorNames'
to choose which predictor variables to use in training. That is,fitcox
uses only the predictor variables inPredictorNames
and the time variable during training.PredictorNames
must be a subset ofX.Properties.VariableNames
and cannot include the name of the time variableT
.By default,
PredictorNames
contains the names of all predictor variables.Specify the predictors for training using either
'PredictorNames'
or a formula in Wilkinson notation, but not both.
Example: 'PredictorNames',{'Sex','Age','Weight','Smoker'}
Data Types: string
 cell
Stratification
— Stratification variables
[]
(default)  matrix of real values  name of column in table X
 array of categorical variables
Stratification variables, specified as a matrix of real values, the name of a
column in table X
, or an array of categorical variables. The matrix
must have the same number of rows as T
, with each row
corresponding to an observation.
The default []
is no stratification variable.
If X
, T
, the value of
'Frequency'
, or the value of 'Stratification'
contains
NaN
values, then fitcox
removes rows with
NaN
values from all data when fitting a Cox model.
Example: 'Stratification',Gender
Data Types: single
 double
 char
 string
 categorical
TieBreakMethod
— Method to handle tied failure times
'breslow'
(default)  'efron'
Method to handle tied failure times, specified as 'breslow'
(Breslow's method) or 'efron'
(Efron's method). See Partial Likelihood Function for Tied Events.
Example: 'TieBreakMethod','efron'
Data Types: char
 string
Version History
Introduced in R2021a
See Also
CoxModel
 hazardratio
 survival
 plotSurvival
 linhyptest
 coefci
 coxphfit
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)