Statistical Analysis and Data Mining

Robust variable screening for regression using factor profiling

Journal Article

Sure independence screening is a fast procedure for variable selection in ultra‐high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue, Factor Profiled Sure Independence Screening (FPSIS) models the correlation structure of the predictor variables, assuming that it can be represented by a few latent factors. The correlations can then be profiled out by projecting the data onto the orthogonal complement of the subspace spanned by these factors. However, neither of these methods can handle the presence of outliers in the data. Therefore, we propose a robust screening method which uses a least trimmed squares method to estimate the latent factors and the factor‐profiled variables. Variable screening is then performed on factor profiled variables by using regression MM‐estimators. Different types of outliers in this model and their roles in variable screening are studied. Both simulation studies and a real data analysis show that the proposed robust procedure has good performance on clean data and outperforms the two nonrobust methods on contaminated data.

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.