A Bayesian Nonparametric Model for Disease Subtyping: Application to Emphysema Phenotypes
We introduce a novel Bayesian nonparametric model that uses the concept of disease trajectories for disease subtype identification. Although our model is general, we demonstrate that by treating fractions of tissue patterns derived from medical images as compositional data, our model can be applied to study distinct progression trends between population subgroups. Specifically, we apply our algorithm to quantitative emphysema measurements obtained from chest CT scans in the COPDGene Study and show several distinct progression patterns. As emphysema is one of the major components of chronic obstructive pulmonary disease (COPD), the third leading cause of death in the United States  , an improved definition of emphysema and COPD subtypes is of great interest. We investigate several models with our algorithm, and show that one with $age$ , $pack~years$ (a measure of cigarette exposure), and $smoking~status$ as predictors gives the best compromise between estimated predictive performance and model complexity. This model identified nine subtypes which showed significant associations to seven single nucleotide polymorphisms (SNPs) known to associate with COPD. Additionally, this model gives better predictive accuracy than multiple, multivariate ordinary least squares regression as demonstrated in a five-fold cross validation analysis. We view our subtyping algorithm as a contribution that can be applied to bridge the gap between CT-level assessment of tissue composition to population-level analysis of compositional trends that vary between disease subtypes.