Every few days, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from Statistics in Medicine and the full article, published in issue 38.23, is available to read online here.
Gasparini, A, Clements, MS, Abrams, KR, Crowther, MJ. Impact of model misspecification in shared frailty survival models. Statistics in Medicine. 2019; 38: 4477– 4502. doi: 10.1002/sim.8309
Survival models incorporating random effects to account for unmeasured heterogeneity are being increasingly used in biostatistical and applied research. Specifically, unmeasured covariates whose lack of inclusion in the model would lead to biased, inefficient results are commonly modelled by including a subject-specific (or cluster-specific) frailty term that follows a given distribution (e.g. Gamma or log-Normal); such models can nowadays be fit using readily available statistical software. Despite that, shared frailty models require modelling assumptions regarding the shape of the baseline hazard and the distribution of the frailty. In these settings, little is known about the impact of wrongly specifying the baseline hazard, the frailty distribution, or both on measures of relative risk, absolute risk, and heterogeneity. This study, therefore, aims to quantify the impact of such misspecification in a wide variety of clinically plausible scenarios via Monte Carlo simulation. Clustered survival data under 90 distinct scenarios are simulated, with varying sample size, frailty distribution, its variance, and shape of the baseline hazard function. A variety of shared frailty models commonly used in the literature are then fitted: semi-parametric Cox models, models that assume a fully parametric baseline hazard function, and models with flexible, spline-based formulations of the baseline hazard; each model is then fit assuming either a Gamma or a log-Normal frailty, arguably the most common distributions being used in practice. This adds up to a total of 22 different models. The results of the study show that the resulting bias can be clinically relevant: misspecification of the baseline hazard leads to biased relative and absolute risk estimates, while misspecification of the frailty distribution affects absolute risk estimates and measures of heterogeneity. In conclusion, the results of this Monte Carlo simulations study highlight (1) the importance of fitting models that are flexible enough to capture the complexities often encountered in applied settings, and (2) the importance of assessing model fit. The conclusions of the study are illustrated in practice with two applied examples using data on diabetic retinopathy and bladder cancer.