Layman’s abstract for paper on variable selection for proportional hazards models with high‐dimensional covariates subject to measurement error

Each week, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.

The article featured today is from the Canadian Journal of Statistics, with the full article now available to read in Early View here.

Chen, B., Yuan, A. and Yi, G.Y. (2020), Variable selection for proportional hazards models with high‐dimensional covariates subject to measurement error. Can J Statistics. doi:10.1002/cjs.11568

High dimensional data are often available in various research areas, and in many cases, only a small number of covariates are important in explaining the outcome. To handle such data, variable selection in high dimensional settings has drawn great attention in recent years. One important area concerns high dimensional survival data where the Cox proportional hazards model is typically employed. However, methods of analyzing this type of data are often challenged by the presence of measurement error in variables, a common issue arising from various applications. Conducting naive analysis with measurement error effects ignored usually gives biased results. However, relatively little research has been focused on this topic. In this paper, the authors consider this important problem and discuss variable selection for proportional hazards models with high dimensional covariates subject to measurement error. They propose a penalized “corrected” likelihood-based method to simultaneously address the measurement error effects and perform variable selection. They establish theoretical results including the consistency, the oracle property, and the asymptotic distribution of the proposed estimator. Simulation studies are conducted to assess the finite sample performance of the proposed method. To illustrate the use of the method, they apply the proposed method to analyze a data set arising from the breast cancer study.