Each week, we select a recently published Open Access article to feature. This week’s article comes from Canadian Journal of Statistics and illustrates objective model selection with parallel genetic algorithms on real data and simulation studies to describe its performance.
The article’s abstract is given below, with the full article available to read here.
Plante, J.-F., Larocque, M. and Adès, M. (2023), Objective model selection with parallel genetic algorithms using an eradication strategy. Can J Statistics. https://doi.org/10.1002/cjs.11775
In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.
More Details