Every week on Statistics Views, we publish layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format. The article featured today is from The Canadian Journal of Statistics: A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models.
Several applied fields, such as epidemiology, survival analysis, genomics, pharmacokinetics/pharmacodynamics and finance, have motivated the development of flexible Bayesian models to estimate the relationship between a response of interest and a set of potential predictors, which do not rely on restrictive assumptions on the way covariate information is included in the model and allow for clustering of the observations based on their covariate pattern. These models are often specified as infinite mixtures of distributions where the covariates are included in the mixing weights.
An alternative, although related solution, consists in specifying a model in which a partition of the observations is estimated placing a prior distribution which includes covariate information over all possible partitions. In a similar setting, observations within each cluster are modeled independently from those assigned to other clusters.
In this work, the authors review the relevant literature about these classes of models highlighting similarities and differences among them. In addition, the authors focus on covariate selection strategies that have been proposed for those models. Variable selection can be performed in order to improve the models’ predictive power or for testing hypothesis about the importance of the covariates. Available variable selection techniques are reviewed and simulation studies are presented to show the performance of variable selection in different scenarios. Finally, the authors apply the most relevant methods on a data set containing levels of glycohemoglobin and twenty-two predictors. The aim is to identify predictors mostly associated with the levels of glycohemoglobin. Using this example, the authors illustrate different ways for summarizing and comparing variable selection output.
Read the full article:
Barcella, W., De Iorio, M., Baio, G. (2017). A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models. The Canadian Journal of Statistics, 45(3), 254-273, September 2017, https://doi.org/10.1002/cjs.11323