Open Access: Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis

Each week, we select a recently published Open Access article to feature. This week’s article is an Invited Review from the Australian & New Zealand Journal of Statistics and looks at Bayesian cluster analysis. 

The article’s abstract is given below, with the full article available to read here.

Greve, J., Grün, B., Malsiner-Walli, G. and Frühwirth-Schnatter, S. (2022), Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis. Aust. N. Z. J. Stat.. https://doi.org/10.1111/anzs.12350
 
Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ‘data clusters’) and determining the first two prior moments of symmetric additive statistics characterising the partitions. The accompanying reference implementation is made available in the R package fipp. Finally, we illustrate the proposed methodology through comparisons and also discuss the implications for prior elicitation in applications.
 
More Details