Chapter 4 Dirichlet Process for Survival Analysis
Dirichlet Process mixture models (DPMM) provide flexible data modeling by allowing for countably many mixtures of parametric distributions and have been widely used in different scenarios. In this project, we demonstrate the flexibility and robustness of DPMM by applying it to the setting of survival analysis. Despite the wide use of the Weibull model to fit survival data, its assumption that a finite collection of parameters (i.e., the shape and scale parameters in a Weibull model) could fully describe the data distribution might be naive. There could be a mixing of mechanisms that leads to different event times for different individuals, which motivates mixture models. However, this gives rise to the problem that, in real life, it could be hard to tell the number of components that compose the mixture distribution. To address this issue, in the following sections, we introduce a DP Weibull mixture model developed by Kottas (2006), where both the shape parameter α and β are mixed so that a wide range of distribution shapes could be well-modeled.
4.1 Model
Suppose we have a dataset with sample size n, then the Dirichlet process Weibull mixture model can be represented by the following hierarchical model using the notations introduced in Chapter 3:
ti∣ai,βii.i.d∼Weibull(αi,βi),i=1,2,…,n
αi,βi∣Gi.i.d∼G,i=1,2,…,n
G∣v,γ,ϕ∼DP(vG0)
v∼Gamma(v∣av,bv) G0(α,β∣ϕ,γ)=Uniform(α|0,ϕ)IGamma(β∣d=2,γ)
γ∼Gamma(γ∣aγ,bγ)
ϕ∼Pareto(ϕ∣aϕ,bϕ)
The parameters for the Weibull kernel, (αi,βi), are drawn from the Dirichlet process G∼DP(vG0), where v is the concentration parameter and the base distribution G0 is defined as the product of a uniform distribution and an inverse Gamma distribution. The shape parameter α follows the uniform distribution, while the scale parameter β follows the inverse Gamma distribution. In the inverse Gamma distribution, we choose d=2 such that λ could have a finite variance. That is,
f(β)=γdΓ(d)x−d−1e−γλ and
E[β]=γd−1=γ Further, the priors for ϕ,γ and v are chosen to facilitate Gibbs sampler.
The posterior distribution is then
F(t;G)=∫Weibull(t∣α,β)G(dα,dβ) where we marginalize over G by integrating out α and β.
Choice of G0
Ideally, we would like to select a base distribution G0 to facilitate posterior analysis. Although base distribution G0 that yields a closed form expression for the above formula is not available (Kottas, 2006), setting G0(α,β∣ϕ,γ)=Uniform(α|0,ϕ)IGamma(λ∣d=2,γ) facilitates the numerical integration involved in the computation of the posterior distribution.
However, the choice of G0 is not unique. For example, the R package DPWeibull
(Shi, 2020) implements a slightly different choice (Shi et al. 2019). We use the R package DPWeibull
for demonstration.
4.2 Simulation
We conducted simulation studies to compare a DP Weibull Mixture model with a traditional Weibull mixture model estimated by the expectation-maximization algorithm in order to demonstrate the flexibility of DPMM. We simulated data using an underlying mixture distribution consisting of three Weibull distributions, and compared the DP Weibull mixture model with a two-component Weibull mixture model that misspecifies the number of components, which is likely to occur in the reality where the correct number of components is unknown. Specifically, the three Weibull distributions contributing to the underlying truth are Weibull(2,2), Weibull(1.5,3), and Weibull(3,3.2), and each contributes equally with a weight of 1/31.
Figure 4.1 shows the survival distribution and hazard function of the mixture distribution we use for simulation. The hazard function shown in Figure 4.1, h(t)=f(t)S(t), is often used to characterize a survival distribution. Conceptually, the hazard function describes the conditional probability for the failure event (i.e., death in this case) to occur at time t given that the patient has survived up to time t. For our simulated data, the hazard first increases up to t=4 when it starts to decrease. However, it is well-known that the hazard function of Weibull distribution is monotone. Thus, although the simulated data is generated from a mixture of Weibull models, any single Weibull model would fail to fit the data adequately.

Figure 4.1: The survival distribution and hazard function of a mixture of three Weibull distributions used for simuation
We further chose three levels of random censoring, 10%, 20%, and 30%, to compare the performances of the two models under different censoring rates. Figure 4.2 summarizes the performance of the Dirichlet process (DP) Weibull mixture model and a Weibull mixture model with two components under the three levels of censoring rates. As the DP Weibull mixture model is free from misspecification of the number of components, it has smaller mean squared errors (MSEs) than a Weibull mixture model with two components. As the censoring rates increases, the proportion of exact observation decreases. For example, a 30% censoring means that, for 30% of observed data, we only know their survival time is longer than their censoring time. Thus, as censoring rates increase, we expect the mean squared errors to increase for both models, as confirmed by Figure 4.2.

Figure 4.2: MSE (in 10^-3) of the DP Weibull Mixture models and the two-component Weibull Mixture models under 10% ,20%, and 30% random censoring
Figures 4.3-4.5 show the estimated survival distribution and the 95% confidence interval (CI) of the two models. The estimates the DP mixture model produces are much closer to the underlying truth than a misspecified mixture model, especially when the censoring rate is high. That being said, we note that both models appear to overestimate the survival probability. We also note that at a high censoring rate, i.e., 30% random censoring, the 95% confidence interval of the DP mixture model barely captures the underlying truth, while the 95% confidence interval of the mixture model manages to do a slightly better job due to its wide width. Nonetheless, the DP Weibull mixture model can still produce a more accurate estimation than a conventional mixture model.

Figure 4.3: Estimates for a mixture of 3 Weibull distributions,10% censoring

Figure 4.4: Estimates for a mixture of 3 Weibull distributions,20% censoring

Figure 4.5: Estimates for a mixture of 3 Weibull distributions,30% censoring
4.3 Case Study
To demonstrate the use of the Dirichlet process Weibull mixture model, we consider a dataset by Haupt and Mansmann (1995) of the survival time in months for patients with liver metastases, which is readily available in the R package locfit
. This dataset is also used by Kottas (2006) for demonstration. Figure 4.6 shows the smoothed hazard function h(t) given by the kernel-based methods provided by the R package muhaz
. The hazard increases until around 20 months, and flattens during 20-35 months, while it appears to start to decrease after 35 months. Similar to our simulation setting, the non-monotone hazard indicates that fitting a single Weibull model would fail to characterize the data.

Figure 4.6: Smoothed kernel-based hazard function for liver metastases survival time
Figure 4.7 shows the estimated survival distribution and hazard of the liver metastases dataset along with their 95% posterior confidence interval (2006). We note that the difference in the choice of base distribution G0 by Kottas (2006) and the R package DPWeibull
leads to different quantitative hazard estimates. However, both demonstrated trends in the original study by Haupt and Mansmann (1995). The estimated hazard increases until roughly 15 months, which levels off and starts to decline after 35 months, which, as Kottas (2006) noted, might be attributed to the fact that roughly 96%2 of exact observations occurred before the 35th month.

Figure 4.7: Estimated survival distribution and hazard of liver metastases based on the DP Weibull mixture model
4.4 Appendix
4.4.1 Simulation Codes
#### Load Libraries
library(survival)
library(DPWeibull)
library(Rsolnp)
### Function to Generate Data from a Mixture of 3 Weibull Components
= function(n, coeff, weibpar1,weibpar2,weibpar3,cpar) {
full.weib.mixed.simulate =0;count2=0;count3=0
count1= 0
j = NULL
output while (j < n) { # iterate until a sample of size n is obtained
= which(rmultinom(1,1,coeff)==1)
mod.sel if (mod.sel == 1){
=count1+1
count1= rweibull(1, shape=weibpar1[1], scale=weibpar1[2])
t
}else if (mod.sel == 2) {
=count2+1
count2= rweibull(1, shape=weibpar2[1], scale=weibpar2[2])
t
}else{ #mod.sel == 2
=count3+1
count3= rweibull(1, shape=weibpar3[1], scale=weibpar3[2])
t
}= rexp(1,cpar) # generate a forward censoring time (this form of ltr + cens, ensures the censoring time is included in the cohort)
cens.t = ifelse(t <= cens.t, t, cens.t) # take the minimum to
surv.time = ifelse(t <= cens.t, 1, 0) # generate status indicator
delta = rbind(output, c(t, delta))
output
=j+1
j
}= data.frame(output)
output
dimnames(output)[[2]] = c("time","indicator") # give names to output columns for easy access
return(output)
}
# load libraries
library(survival)
library(DPWeibull)
library(Rsolnp)
### Likelihoods Function for Two-Component Weibull Mixture
.2Weib<-function(param,fwrd.dat){
full.mixture.likelihood
= param[1];weib2.coef= param[2];weib1.shp= param[3];weib1.scl= param[4];weib2.shp= param[5];weib2.scl= param[6]
weib1.coef#print(c(weib1.coef,weib2.coef,weib1.shp, weib1.scl, weib2.shp,weib2.scl))
<-function(x,weib1.coef,weib2.coef,weib1.shp, weib1.scl, weib2.shp,weib2.scl){
freturn(weib1.coef*dweibull(x,shape=weib1.shp,scale = weib1.scl)+weib2.coef*dweibull(x,shape=weib2.shp,scale = weib2.scl))
}
<-function(x,weib1.coef,weib2.coef,weib1.shp, weib1.scl, weib2.shp,weib2.scl){
Sreturn(weib1.coef*(1-pweibull(x,shape=weib1.shp,scale = weib1.scl))+weib2.coef*(1-pweibull(x,shape=weib2.shp,scale = weib2.scl)))
}
= log1p(f(fwrd.dat$time,weib1.coef=weib1.coef,weib2.coef=weib2.coef,weib1.shp=weib1.shp, weib1.scl=weib1.scl, weib2.shp=weib2.shp,weib2.scl=weib2.scl)-1)*fwrd.dat$indicator
fail.vals
= log1p(S(fwrd.dat$time,weib1.coef=weib1.coef,weib2.coef=weib2.coef,weib1.shp=weib1.shp, weib1.scl=weib1.scl, weib2.shp=weib2.shp,weib2.scl=weib2.scl)-1)*(1-fwrd.dat$indicator)
cens.vals
= -sum(fail.vals)-sum(cens.vals)
like.result
=exp(like.result)
like.result#print(c(sum(fail.vals),sum(cens.vals)))
return(like.result)
}
### Data Generation
# parameters for simulation
=rep(1/3,3)
coeff= seq(0,8,by=0.1)
MSE.times =c(2,2);weibpar2=c(1.5,3);weibpar3=c(3,3.2)
weibpar1# 30% censoring:cpar = 0.15
# 20% censoring:cpar = 0.09
# 10% censoring:cpar = 0.045
= 0.09
cpar
=50
msim=200
nsim
set.seed(2021)
=list()
full.weib.mixed.sim.datafor(i in 1:msim){
=full.weib.mixed.simulate(nsim,coeff =coeff,weibpar1 = weibpar1,weibpar2 = weibpar2,weibpar3=weibpar3,cpar=0.09)
full.weib.mixed.sim.data[[i]]
}sum(full.weib.mixed.sim.data[[1]]$indicator==1)/nsim
### two-component Weibull Mixture Estimates
=function(x,fwrd.dat){
eqn1return(x[1]+x[2]-1)
}
=0.5;weib2.coef=0.5;weib1.shp=1.8; weib1.scl=1.8; weib2.shp=3;weib2.scl=3
weib1.coef=c(weib1.coef,weib2.coef,weib1.shp, weib1.scl, weib2.shp,weib2.scl)
x0
.2Weib = list()
opt.mixturefor(i in 1:msim){
.2Weib[[i]]=solnp(c(weib1.coef,weib2.coef,weib1.shp, weib1.scl, weib2.shp,weib2.scl), fun=full.mixture.likelihood.2Weib,fwrd.dat=full.weib.mixed.sim.data[[i]], eqfun = eqn1,eqB = 0,control=list(trace=0), LB=c(0,0,0,0,0,0), UB=c(1,1,Inf,Inf,Inf,Inf))$pars
opt.mixture
}
= matrix(unlist(opt.mixture.2Weib),nrow =50,byrow = T)
res
res
## plot
plot(MSE.times, 1-(coeff[1]*pweibull(MSE.times, weibpar1[1],weibpar1[2])+coeff[2]*pweibull(MSE.times, weibpar2[1],weibpar2[2])+coeff[3]*pweibull(MSE.times, weibpar3[1],weibpar3[2])),type='l')
for (i in 1:msim) {
lines(MSE.times,opt.mixture.2Weib[[i]][1]*(1 - pweibull(MSE.times, shape=opt.mixture.2Weib[[i]][3], scale=opt.mixture.2Weib[[i]][4]))+opt.mixture.2Weib[[i]][2]*(1 - pweibull(MSE.times, shape = opt.mixture.2Weib[[i]][5], scale=opt.mixture.2Weib[[i]][6])),col='red')
}
### DP Weibull Mixture Estimates
=list() # no covariate
DP.lstfor(i in 1:msim){
=dpweib(Surv(time,indicator)~1,data=full.weib.mixed.sim.data[[i]],predtime=MSE.times)
DP.lst[[i]]
}
### MSE and 95% CI
.2weib.MSE = rep(0, msim);DPM.MSE = rep(0, msim)
mixture# a matrix to record the estimates by conventional mixture model for each simulated dataset
.2weib.est.matrix = matrix(NA,msim,length(MSE.times))
mixture# a matrix to record the estimates by DPM for each simulated dataset
= matrix(NA,msim,length(MSE.times))
DPM.est.matrix
# underlying truth
= 1-(coeff[1]*pweibull(MSE.times, weibpar1[1],weibpar1[2])+coeff[2]*pweibull(MSE.times, weibpar2[1],weibpar2[2])+coeff[3]*pweibull(MSE.times, weibpar3[1],weibpar3[2]))
truth
for (i in 1:msim) {
.2weib.est=opt.mixture.2Weib[[i]][1]*(1 - pweibull(MSE.times, shape=opt.mixture.2Weib[[i]][3], scale=opt.mixture.2Weib[[i]][4]))+opt.mixture.2Weib[[i]][2]*(1 - pweibull(MSE.times, shape = opt.mixture.2Weib[[i]][5], scale=opt.mixture.2Weib[[i]][6]))
mixture.2weib.est.matrix[i,]=mixture.2weib.est
mixture
.2weib.MSE[[i]] = sum((MSE.times[2] - MSE.times[1])*( mixture.2weib.est-truth)^2)
mixture
= DP.lst[[i]]$Spred
DPM.est =DPM.est
DPM.est.matrix[i,]= sum((MSE.times[2] - MSE.times[1])*( DPM.est -truth)^2)
DPM.MSE[[i]]
}
# MSE
mean(mixture.2weib.MSE); var(DPM.MSE)
# empirical 95% confidence intervals
.2weib.est.mean=apply(mixture.2weib.est.matrix,2,mean)
mixture=apply(DPM.est.matrix,2,mean)
DPM.est.mean.2weib.est.low=apply(mixture.2weib.est.matrix,2,function(x){
mixturequantile(x,0.025)
}).2weib.est.high=apply(mixture.2weib.est.matrix,2,function(x){
mixturequantile(x,0.975)
})=apply(DPM.est.matrix,2,function(x){
DPM.est.lowquantile(x,0.025)
})=apply(DPM.est.matrix,2,function(x){
DPM.est.highquantile(x,0.975)
})
# Plot
plot(MSE.times,truth,type='l',xlab='Time',ylab='Survival Probability',main="10% Censoring")
lines(MSE.times,mixture.2weib.est.mean,type='l',col='blue');lines(MSE.times,mixture.2weib.est.low, lty=2,col='blue');lines(MSE.times,mixture.2weib.est.high, lty=2,col='blue')
legend('topright',inset=0.02,legend=c("Underlying Truth", "Weibull Mixture Model with 2 Components","95% Confidence Interval"),
col=c("black", "blue","blue"), lty=c(1,1,2), cex=0.5,box.lty=0)
plot(MSE.times,truth,type='l',xlab='Time',ylab='Survival Probability',main="10% Censoring")
lines(MSE.times,DPM.est.mean,type='l',col='red');lines(MSE.times,DPM.est.low, lty=2,col='red');lines(MSE.times,DPM.est.high, lty=2,col='red')
legend('topright',inset=0.02, legend=c("Underlying Truth", "Dirichlet Process Weibull Mixture Model","95% Confidence Interval"),
col=c("black", "red","red"), lty=c(1,1,2), cex=0.5,box.lty=0)
4.4.2 Liver Metastases Dataset
library(dplyr)
library(locfit)
data(livmet)
# exact observation distribution
sum((livmet%>%filter(z==1))$t<35)/nrow(livmet%>%filter(z==1))
# fit model
=dpweib(Surv(t,z)~1,data=livmet,predtime=seq(0,50,by=0.1),a=3,b=0.05)
DP.livmet# plot
plot(seq(0,50,by=0.1),DP.livmet$Spredu,lty=2,type='l',xlab='Time(month)',ylab='Survival Probability',ylim=c(0,1))
lines(seq(0,50,by=0.1),DP.livmet$Spred,lty=1,type='l')
lines(seq(0,50,by=0.1),DP.livmet$Spredl,lty=2,type='l')
legend('topright',inset=0.02, legend=c("DP Weibull Mixture Estimates","95% Confidence Interval"), lty=c(1,2), cex=0.5,box.lty=0)
plot(seq(0,50,by=0.1),DP.livmet$hpredu,lty=2,type='l',xlab = 'Time', ylab='Hazard')
lines(seq(0,50,by=0.1),DP.livmet$hpredl,lty=2)
lines(seq(0,50,by=0.1),DP.livmet$hpred,lty=1)
legend('topright',inset=0.02, legend=c("Estimated Hazard","95% CI"), lty=c(1,2), cex=0.5,box.lty=0)
Reference
Haupt, G. and Mansmann, U. (1995). CART for Survival Data. Statlib Archive, http://ftp.uni-bayreuth.de/math/statlib/S/survcart
Kottas, A. (2006). Nonparametric Bayesian survival analysis using mixtures of Weibull distributions. Journal of Statistical Planning and Inference, 136(3), 578-596. doi:10.1016/j.jspi.2004.08.009
Shi, Y., Martens, M., Banerjee, A., & Laud, P. (2019). Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models. Bayesian Analysis, 14(3). doi:10.1214/18-ba1119
Shi, Y. (2020). DPWeibull: Dirichlet Process Weibull Mixture Model for Survival Data . R package version 1.18 (unpublished)