survstan

R build status CRAN_Status_Badge Downloads Total Downloads The aim of the R package survstan is to provide a toolkit for fitting survival models using Stan.

The R package survstan can be used to fit right-censored survival data under independent censoring. The implemented models allow the fitting of survival data in the presence/absence of covariates. All inferential procedures are currently based on the maximum likelihood (ML) approach.

Installation

You can install the released version of survstan from CRAN with:

install.packages("survstan")

You can install the development version of survstan from GitHub with:

# install.packages("devtools")
devtools::install_github("fndemarqui/survstan")

Inference procedures

Let \((t_{i}, \delta_{i})\) be the observed survival time and its corresponding failure indicator, \(i=1, \cdots, n\), and \(\boldsymbol{\theta}\) be a \(k \times 1\) vector of parameters. Then, the likelihood function for right-censored survival data under independent censoring can be expressed as:

\[ L(\boldsymbol{\theta}) = \prod_{i=1}^{n}f(t_{i}|\boldsymbol{\theta})^{\delta_{i}}S(t_{i}|\boldsymbol{\theta})^{1-\delta_{i}}. \]

The maximum likelihood estimate (MLE) of \(\boldsymbol{\theta}\) is obtained by directly maximization of \(\log(L(\boldsymbol{\theta}))\) using the rstan::optimizing() function. The function rstan::optimizing() further provides the hessian matrix of \(\log(L(\boldsymbol{\theta}))\), needed to obtain the observed Fisher information matrix, which is given by:

\[ \mathscr{I}(\hat{\boldsymbol{\theta}}) = -\frac{\partial^2}{\partial \boldsymbol{\theta}\boldsymbol{\theta}'} \log L(\boldsymbol{\theta})\mid_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}, \]

Inferences on \(\boldsymbol{\theta}\) are then based on the asymptotic properties of the MLE, \(\hat{\boldsymbol{\theta}}\), that state that:

\[ \hat{\boldsymbol{\theta}} \asymp N_{k}(\boldsymbol{\theta}, \mathscr{I}^{-1}(\hat{\boldsymbol{\theta}})). \]

Baseline Distributions

Some of the most popular baseline survival distributions are implemented in the R package survstan. Such distributions include:

The parametrizations adopted in the package survstan are presented next.

Exponential Distribution

If \(T \sim \mbox{Exp}(\lambda)\), then

\[ f(t|\lambda) = \lambda\exp\left\{-\lambda t\right\}I_{[0, \infty)}(t), \] where \(\lambda>0\) is the rate parameter.

The survival and hazard functions in this case are given by:

\[ S(t|\lambda) = \exp\left\{-\lambda t\right\} \] and \[ h(t|\lambda) = \lambda. \]

Weibull Distribution

If \(T \sim \mbox{Weibull}(\alpha, \gamma)\), then

\[ f(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}\exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\}I_{[0, \infty)}(t), \] where \(\alpha>0\) and \(\gamma>0\) are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

\[ S(t|\alpha, \gamma) = \exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\} \] and \[ h(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}. \]

Lognormal Distribution

If \(T \sim \mbox{LN}(\mu, \sigma)\), then

\[ f(t|\mu, \sigma) = \frac{1}{\sqrt{2\pi}t\sigma}\exp\left\{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right\}I_{[0, \infty)}(t), \] where \(-\infty < \mu < \infty\) and \(\sigma>0\) are the mean and standard deviation in the log scale of \(T\).

The survival and hazard functions in this case are given by:

\[S(t|\mu, \sigma) = \Phi\left(\frac{-log(t)+\mu}{\sigma}\right)\] and \[h(t|\mu, \sigma) = \frac{f(t|\mu, \sigma)}{S(t|\mu, \sigma)},\] where \(\Phi(\cdot)\) is the cumulative distribution function of the standard normal distribution.

Loglogistic Distribution

If \(T \sim \mbox{LL}(\alpha, \gamma)\), then

\[ f(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{\left[1 + \left(\frac{t}{\gamma}\right)^{\alpha}\right]^2}I_{[0, \infty)}(t), ~ \alpha>0, \gamma>0, \]

where \(\alpha>0\) and \(\gamma>0\) are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

\[S(t|\alpha, \gamma) = \frac{1}{1+ \left(\frac{t}{\gamma}\right)^{\alpha}}\] and \[ h(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{1 + \left(\frac{t}{\gamma}\right)^{\alpha}}. \]

Gamma Distribution

If \(T \sim \mbox{Gamma}(\alpha, \gamma)\), then

\[f(t|\alpha, \lambda) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)}t^{\alpha-1}\exp\left\{-\lambda t\right\}I_{[0, \infty)}(t),\]

where \(\Gamma(\alpha) = \int_{0}^{\infty}u^{\alpha-1}\exp\{-u\}du\) is the gamma function.

The survival function is given by

\[S(t|\alpha, \lambda) = 1 - \frac{\gamma^{*}(\alpha, \gamma t)}{\Gamma(\alpha)},\] where \(\gamma^{*}(\alpha, \gamma t)\) is the lower incomplete gamma function, which is available only numerically. Finally, the hazard function is expressed as:

\[h(t|\alpha, \lambda) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}.\]

Rayleigh Distribution

Let \(T \sim \mbox{rayleigh}(\sigma)\), where \(\sigma>0\) is a scale parameter. Then, the density, survival and hazard functions are respectively given by:

\[f(t|\sigma) = \frac{x}{\sigma^2}\exp\left\{-\frac{x^2}{2\sigma^2}\right\},\] \[S(t|\sigma) = \exp\left\{-\frac{x^2}{2\sigma^2}\right\}\] and

\[h(t|\sigma) = \frac{x}{\sigma^2}.\]

Birnbaum-Saunders (fatigue) Distribution

If \(T \sim \mbox{fatigue}(\alpha, \gamma)\), then

\[ f(t|\alpha, \gamma) = \frac{\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}}{2 \alpha t}\phi\left(\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}\right)(t), ~ \alpha>0, \gamma>0, \]

where \(\phi(\cdot)\) is the probability density function of a standard normal distribution, \(\alpha>0\) and \(\gamma>0\) are the shape and scale parameters, respectively.

The survival function in this case is given by:

\[ S(t|\alpha, \gamma) =\Phi\left(\sqrt{\frac{t}{\gamma}}-\sqrt{\frac{\gamma}{t}}\right)(t) \],

where \(\Phi(\cdot)\) is the cumulative distribution function of a standard normal distribution. The hazard function is given by \[h(t|\mu, \sigma) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}. \]

Regression models

When covariates are available, it is possible to fit four different regression models with the R package survstan:

The regression survival models implemented in the R package survstan are briefly described in the sequel. Denote by \(\mathbf{x}\) a \(1\times p\) vector of covariates, and let \(\boldsymbol{\beta}\) and \(\boldsymbol{\phi}\) be \(p \times 1\) of regression coefficients, and \(\boldsymbol{\theta}\) a vector of parameters associated with some baseline survival distribution. To ensure identifiability of the implemented regression models, we shall assume that the linear predictors \(\mathbf{x} \boldsymbol{\beta}\) and \(\mathbf{x} \boldsymbol{\phi}\) do not include a intercept term.

Accelerate Failure Time Models

Accelerated failure time (AFT) models are defined as

\[ T = \exp\{\mathbf{x} \boldsymbol{\beta}\}\nu, \] where \(\nu\) follows a baseline distribution with survival function \(S_{0}(\cdot|\boldsymbol{\theta})\) so that

\[ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = e^{-\mathbf{x} \boldsymbol{\beta}}f_{0}(te^{-\mathbf{x} \boldsymbol{\beta}}|\boldsymbol{\theta}) \] and

\[ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = S_{0}(t e^{-\mathbf{x} \boldsymbol{\beta}}|\boldsymbol{\theta}). \]

Proportional Hazards Models

Proportional hazards (PH) models are defined as

\[ h(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta}\}, \] where \(h_{0}(t|\boldsymbol{\theta})\) is a baseline hazard function so that

\[ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}(t|\boldsymbol{\theta})\exp\left\{\mathbf{x} \boldsymbol{\beta} - H_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}\right\}, \] and

\[ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \exp\left\{ - H_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}\right\}. \]

Proportional Odds Models

Proportional Odds (PO) models are defined as

\[ R(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = R_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta}\}, \] where \(\displaystyle R_{0}(t|\boldsymbol{\theta}) = \frac{1-S_{0}(t|\boldsymbol{\theta})}{S_{0}(t|\boldsymbol{\theta})} = \exp\{H_{0}(t|\boldsymbol{\theta})\}-1\) is a baseline odds function so that

\[ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{h_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta} + H_{0}(t|\boldsymbol{\theta})\}}{[1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}]^2}. \]

and

\[ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{1}{1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}}. \]

Accelerated Hazard Models

Accelerated hazard (AH) models can be defined as

\[h(t|\boldsymbol{\theta}, \boldsymbol{\beta},\mathbf{x}) = h_{0}\left(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)\]

so that

\[S(t|\boldsymbol{\theta}, \boldsymbol{\beta},\mathbf{x}) = \exp\left\{- H_{0}\left(t/ e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)e^{\mathbf{x}\boldsymbol{\beta}} \right\} \] and \[f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}\left(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)\exp\left\{- H_{0}\left(t/ e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)e^{\mathbf{x}\boldsymbol{\beta}} \right\}. \]

Extended hazard Models

The survival function of the extended hazard (EH) model is given by:

\[S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \exp\left\{-H_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp(\mathbf{x}(\boldsymbol{\beta} + \boldsymbol{\phi}))\right\}. \]

The hazard and the probability density functions are then expressed as:

\[h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = h_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp\{\mathbf{x}\boldsymbol{\phi}\} \] and

\[f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = h_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp\{\mathbf{x}\boldsymbol{\beta}\}\exp\left\{-H_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp(\mathbf{x}(\boldsymbol{\beta}+ \boldsymbol{\phi}))\right\}, \]

respectively.

The EH model includes the AH, AFT and PH models as particular cases when \(\boldsymbol{\phi} = \mathbf{0}\), \(\boldsymbol{\phi} = -\boldsymbol{\beta}\), and \(\boldsymbol{\beta} = \mathbf{0}\), respectively.

Yang and Prentice Models

The survival function of the Yang and Prentice (YP) model is given by:

\[S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-\kappa_{L}}. \]

The hazard and the probability density functions are then expressed as:

\[h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \frac{\kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}}{\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]} \] and

\[f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-(1+\kappa_{L})}, \]

respectively, where \(\kappa_{S} = \exp\{\mathbf{x}\boldsymbol{\beta}\}\) and \(\kappa_{L} = \exp\{\mathbf{x}\boldsymbol{\phi}\}\).

The YO model includes the PH and PO models as particular cases when \(\boldsymbol{\phi} = \boldsymbol{\beta}\) and \(\boldsymbol{\phi} = \mathbf{0}\), respectively.