Layman’s abstract for paper on partial order relations for classification comparisons

Every few days, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.

The article featured today is from the Canadian Journal of Statistics, with the full article now available to read here.

Chang, L.‐B. (2020), Partial order relations for classification comparisons. Can J Statistics, 48: 152-166. doi:10.1002/cjs.11524

Classification is a fundamental component of many statistical and machine learning Applications. Ergo, as a large number of new classification algorithms are built every year, interest in comparing the performance of classification algorithms has become increasingly important. The most common optimal classification is based on the Bayes classification rule or the Neyman-Pearson lemma. The Bayes classification rule offers the optimal classifier, which minimizes the classification error rate, the probability of misclassification. For two-class classification problems (e.g., cancer versus normal), the Neyman-Pearson lemma offers the optimal family of classifiers, which maximizes detection rate, the probability of correctly classifying an observation from a particular class (e.g., cancer), for any given false alarm rate, the probability of incorrectly classifying an observation from the other class (e.g., normal). These motivate studies on comparing classifiers based on similarities between the classifiers and the optimal. In this paper, we define a partial order relation on classifiers and a partial order relation on families of classifiers. Each partial order relation provides a sufficient condition, which yields better classification error rates or better detection rates for any given false alarm rate. Various examples and applications of the partial order theorems are discussed to provide comparisons of classifiers and families of classifiers, including the comparison of cross-validation methods, training data that contains outliers, and training data that contains labeling errors.