Each week, we publish layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from the recent Applied Stochastic Models in Business and Industry special issue on Recent Advances in Business Analytics and Data Science with the full article now available to read here.
Network-based semisupervised clustering. Appl Stochastic Models Bus Ind. 2021; 37: 182– 202. https://doi.org/10.1002/asmb.2618
, , , . In cluster analysis, observations are grouped into disjoint sets in such a way that those belonging to the same cluster are similar. Sometimes, among the set of variables used in cluster analysis one variable conveys important information about the grouping structure and this information is not always caught by the clustering process. In this case, by applying a semi-supervised approach this outcome variable is used to lead the clustering process and enhance the quality of results. Network-based Semi-Supervised Clustering (NeSSC) is proposed as a semi-supervised clustering model that utilises the information related to a specific outcome variable to identify groups being as much as possible internally homogeneous mainly with respect to this outcome. NeSSC firstly uses machine learning models with community detection algorithms to create a complex network explaining the relationships between observations, and next combines them into homogeneous clusters using community detection algorithms. Results are easily interpretable through graphics and tables and informative, allowing the user to carefully investigate the relationships between observations and the role of the different variables within the clusters. An illustrative example about house prices in Munich and several other examples on both real and simulated data demonstrate the effectiveness of NeSSC.