Dirichlet Process – with Applications on Survival Analysis
2021-12-15
Chapter 1 Introduction
1.1 Survival Data
Survival analysis is a statistical technique that analyzes the time-to-event data, i.e., the time \(T\) since an onset event to an event of interest, such as the lifetime of biological organism and usage period of a mechanical system. In biomedical studies, modeling and predicting patients’ survival time, i.e., time to the failure event, after diagnosis of specific severe disease, is a typical application of survival analysis. However, time-to-event data has several special characteristics:
The distributions for survival data are supported by the positive real line, \(\mathbb{R}^+\).
The observed data is often right censored. For example, in a medical study where the event of interest is death due to a certain disease, the time duration since the onset event (for example, the diagnosis of the disease) to the time of death may notbe fully observed because of loss of contact or termination of the study.
Figure 1.1 is a visualization for time-to-event data. We use a variable \(\delta_i\) to indicate whether observation \(i\) is censored or not. Conventionally, \(\delta_i=1\) if data point \(i\) is an exact observation, while \(\delta_i=0\) if data point \(i\) is a censored observation. The figure below is a graphical representation of what time-to-event data looks like using three samples. \(T_2\) is an exact observation. However, both \(T_1\) and \(T_2\) are censored. \(T_1\) is censored due to, for example, loss of contact, while \(T_3\) is censored due to the end of the study.
One of the most common parametric model used for survival data with a unimodal density is the Weibull Model (Qin ei al., 2009), which is known for its flexibility and interpretability. A Weibull model is defined by a shape parameter \(\alpha\) and a scale parameter \(\beta\):
\[T\sim Weibull(\alpha,\beta)\] \[f(t) = \frac{\alpha}{\beta}(\frac{t}{\beta})^{\alpha-1}e^{-(\frac{t}{\beta})^\alpha},\qquad t\geq0 \] \[F(t) = 1-e^{-(\frac{t}{\beta})^\alpha}, \qquad t\geq0 \]
However, in survival analysis, what we are interested in is survival distributions
\[S(t) = P(T\geq t) =1-F(t), \qquad t\geq0 \]
However, there are cases when a Weibull Model fails to describe the data. Despite the efforts to model for covariates age, gender, treatment received, comorbidity, and so on, there could be heterogeneity in the failure mechanism that cannot be fully explained by observed covariates alone. This motivates the development of Weibull mixture models such as the work of Jiang and Kececioglu (1992) and Razali and Al-Wakeel (2013). However, even a mixture model with finite components may not be adequate since the finite nature of its parameter imposes assumptions on data. In our project, we introduce a Bayesian non-parametric mixture model of Weibull distributions through the Dirichlet process to account for potentially hidden failure mechanisms, which allows for potentially infinitely many components.
We will introduce the Non-parametric Bayesian models and the Dirichlet process and then the Dirichlet process (DP) Weibull mixture model. In this chapter, we will clarify the definition of non-parametric models. Chapter 2 and 3 focus on explaining the Dirichlet Process and its Bayesian perspective. We present the Dirichlet Process Weibull mixture model, simulation results, and a case study in Chapter 4.
1.2 Non-Parametric Bayesian
Parametric Bayesian
\[\mathbb{P}(\text{parameters}|\text{data}) \sim \mathbb{P}(\text{data}|\text{parameters}) \mathbb{P}(\text{parameters})\]
The formula above is the kernel for Bayesian statistics. In parametric Bayesian models, a finite set of parameters can be used to describe the model.
Non-Parametric Bayesian
Non-parametric Bayesian means that the number of parameters is unbound and growing. The “Wikipedia Phenomena” is one of the typical metaphors to illustrate the idea of a growing number of parameters.
1.2.1 Wikipedia Phenomena
Supposed we are searching “signal processing” on Wikipedia, there will be many links of other related topics to signal processing, and maybe “optimization” is one of them. If we go to the Wiki page of “optimization”, we can find more links to topics related to “optimization”. When we explore Wikipedia, the number of topics we found keeps increasing. There are other \(3\) important observations in this exploration process
The next topic will always depend on the previous topics.
Many of the topics may belong to the same fields so that we can assign them in different groups.
The more topics we have in a field, the higher probability that the next topic will also be in the same field.
These 3 observations reflect 3 important properties of generating sampling from a Dirichlet distribution: the next sample is dependent on previous samples; samples can be assigned to different clusters; the more sample we have in one cluster, the more likely the next sample we draw will belongs the same cluster that has the most samples. In the following two chapters, we will discuss Dirichlet Process as a non-parametric Bayesian method with these three properties.
1.2.2 Cluster
If a group of objects in a set is more similar to each other within this group than with other objects outside the group, we can say that the group of objects forms a cluster. The idea of clustering is closely related to mixture models. Conceptually, each model component in a mixture model can be viewed as a cluster. Figure 1.2 is an example of a mixture model with \(3\) normally distributed clusters. The colored squares represent clusters, and the black points around them are data in each cluster. The idea of clustering is applied in drawing samples in a Dirichlet distribution and the Dirichlet process, which will be demonstrated by examples, metaphor, and simulations in Chapter 2 and 3.
Reference
Broderick, T. (n.d.). Tutorial of Bayesian Nonparametric. Tamara Broderick. Retrieved November 13, 2021, from https://tamarabroderick.com/tutorials.html.
Jiang, S., & Kececioglu, D. (1992). Maximum likelihood estimates, from censored data, for mixed-Weibull distributions. IEEE Transactions on Reliability, 41(2), 248-255. doi:10.1109/24.257791
Qin, X., Zhang, J., & Yan, X. (2009). A Finite Mixture Three-Parameter Weibull Model for the Analysis of Wind Speed Data. 2009 International Conference on Computational Intelligence and Software Engineering. doi:10.1109/cise.2009.5362709
Razali, A. M., & Al-Wakeel, A. A. (2013). Mixture Weibull distributions for fitting failure times data. Applied Mathematics and Computation, 219(24), 11358-11364. doi:10.1016/j.amc.2013.05.062