Statistical Analysis and Data Mining: CLADAG 2021 special issue: Selected papers on classification and data analysis

This special issue of Statistical Analysis and Data Mining contains a selection of the papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG), scheduled for 9–11 September 2021 in Florence, Italy. Due to the COVID-19 pandemic, the conference was held online.

CLADAG is a Section of the Società Italiana di Statistica, and a member of the International Federation of Classification Societies (IFCS). It was founded in 1997 to promote advanced methodological research in multivariate statistics, focusing on Data Analysis and Classification. The Section organizes a biennial international scientific meeting, offers classification and data analysis courses, publishes a newsletter, and collaborates on planning conferences and meetings with other IFCS societies. The previous 12 CLADAG meetings were held in various locations throughout Italy: Pescara (1997), Roma (1999), Palermo (2001), Bologna (2003), Parma (2005), Macerata (2007), Catania (2009), Pavia (2011), Modena and Reggio Emilia (2013), Cagliari (2015), Milano (2017), and Cassino (2019).

Following a blind peer-review process, six papers presented at the conference and submitted to this special issue have been selected for publication. The articles cover a broad range of data analysis topics: gender gap analysis, income clustering, structural equation modeling, multivariate nonparametric methods, and classifier selection. Their content is briefly described below.

In studying the gender gap, a relevant topic for promoting equality and social justice, Greselin et al. propose a new parametric approach utilizing the relative distribution method and Dagum parametric inference. Additionally, they assessed how to select covariates that impact gender gaps. The proposed approach is applied to measure and compare the gender gap in Poland and Italy, using data from the 2018 European Survey of Income and Living Conditions.

On a related field, Condino proposes a procedure for clustering income data using a share density-based dynamic clustering algorithm. The paper compares subgroups’ income inequality using a dissimilarity measure based on information theory. This measure is then utilized for clustering, providing a prototype descriptor of income inequality for the clustered earners. The proposal is applied to data from the Survey on Households Income and Wealth by the Bank of Italy.

The paper by Yu et al. introduces a refinement of the so-called Henseler–Ogasawara specification that integrates composites, linear combinations of variables, into structural equation models. This refined version addresses some concerns of the Henseler–Ogasawara specification, and it is less complex and less prone to misspecification mistakes. Additionally, the paper provides a strategy to compute standard errors.

Statistical depth functions are a valuable tool for multivariate nonparametric data analysis, extending the concept of ranks, orderings, and quantiles to the multivariate setup. The paper by Laketa and Nagy investigates one of the fundamental open problems of contemporary depth research, the so-called characterization and reconstruction questions, focusing on the simplicial depth. Their results are illustrated via several insightful examples.

On the same topic, Nagy revisits the classical definition of the simplicial depth and explores its theoretical properties. Particularly, properties of the simplicial median are investigated. The author provides the exact simplicial depth in several scenarios, outlining undesirable behaviors of this depth function.

Carpita and Golia tackle the problem of choosing the rule to assign a unit to a category given the estimated probabilities. In particular, the paper compares the classical Bayesian Classifier, which minimizes the expected classification error rate, with the Max Difference Classifier and the Max Ratio Classifier, showing when these classifiers should be preferred. Findings are illustrated by means of a broad simulation study and an application on benchmark data sets.

To conclude, we believe that this special issue accurately portrays the scientific features of the CLADAG community nowadays and supports the CLADAG mission of facilitating the exchange of ideas in Classification and Data Analysis. We warmly encourage all readers to attend the next CLADAG conference, which will be held in Salerno from 11 to 13 September 2023.

 

More Details