Each week, we publish layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from the Canadian Journal of Statistics, with the full article now available to read here.
Tay, J.K., Friedman, J. and Tibshirani, R. (2021), Principal component-guided sparse regression. Can J Statistics. https://doi.org/10.1002/cjs.11617
This paper proposes a new method for fitting predictive models named the “principal components lasso” (“pcLasso“). The fitted model is the solution to a penalized regression problem. The novel penalty term is chosen such that the fitted model has two properties. First, it is “sparse”, meaning that the model’s predictions are based on a small fraction of the features that are available to it. This makes the model easy to interpret and understand. Second, the model’s predictions are more closely aligned to the more important directions of variation in the feature matrix. These directions of variation, also known as the leading principal components of the feature matrix, are often correlated with variation in the response of interest, so aligning the model’s predictions toward them can often improve prediction performance. pcLasso can be especially powerful if the features are pre-assigned to groups. In that case, pcLasso aligns the model predictions solution toward the leading principal components of the feature group matrices. In the process, it also carries out selection of feature groups, resulting in more interpretable models. The method can be used whether the groups are overlapping or not. The paper also presents some theoretical results for pcLasso and illustrates its performance on simulated and real data examples.