The lay abstract featured today (for Accommodating the Analysis Model in Multiple Imputation for the Weibull Mixture Cure Model: Performance Under Penalized Likelihood by Changchang Xu, Laurent Briollais, Irene L. Andrulis, Shelley B. Bull) is from Statistics in Medicine with the full Open Access article now available to read here.
How to cite
C. Xu, L. Briollais, I.L. Andrulis, and S.B. Bull, Accommodating the Analysis Model in Multiple Imputation for the Weibull Mixture Cure Model: Performance Under Penalized Likelihood, Statistics in Medicine 45, no. 6-7 (2026): e70437, https://doi.org/10.1002/sim.70437.
Lay Abstract
Disease prognostic studies, particularly in cancer, often examine biomarkers, for example, measurable features such as levels of proteins in a tumour, to understand why some patients do better than others after treatment. To evaluate the association of molecular biomarkers with differential prognosis, survival analyses that jointly examine the effects of these biomarkers together with other traditional prognostic factors are commonly used. The objective of these studies is to identify patients at a greater risk of disease recurrence after initial therapy and who may be more likely to benefit from targeted treatment. Equally important, patients who are very unlikely to recur could avoid aggressive therapy and its side effects. Especially in studies of an early-stage disease, a significant proportion of patients will never experience disease recurrence in their life time, and can be considered as long-term survivors or as (statistically) cured. Therefore, in analysis of such time-to-disease recurrence outcomes, a class of statistical methods known as mixture cure (MC) models for patient outcomes are preferred over the conventional Cox proportional hazard (Cox-PH). The Cox-PH assumption that all subjects would eventually experience the event is violated, whereas the MC method can both model the probability that the patient has been cured and time to recurrence if not cured. Further practical obstacles associated with this type of data include low event counts (due to heavy censoring) and imbalanced covariate data causing biased inference in finite sample sizes. Moreover, missing values in prognostic factor values such as tumour protein expression measures can lead to reduced statistical efficiency and power.
Prognostic study datasets often contain missing prognostic factor values (e.g., three biomarkers measured but a fourth missing in some patients); restricting analysis to only complete cases can throw away a large portion of the cohort and weaken or bias the conclusions. A well-established missing data approach is multiple imputation (MI) with fully conditional specification. However, an imputation model that is incompatible with the analysis model can impair the accuracy of point estimates of odds ratio (OR) and/or hazard ratio (HR) as well as interval estimates. By engaging the MC analysis model likelihood in the MI procedure, this work specifies imputation models that are compatible with MC model analysis to identify prognostic factors associated with disease recurrence. In the presence of low event numbers and imbalanced covariates, the large sample Wald test statistics loses its validity for parameter inference. With the incorporation of penalized likelihood point estimation and combined likelihood profile inference, simulations show that the proposed MI procedure produces less biased OR and HR estimates and intervals with higher coverage of the underlying true values than the commonly applied MI methods.
Application of the methods in a prospective cohort of women with long term follow-up for breast cancer recurrence achieves the objective to address limitations of existing MI methods for the MC model and improve the ability to identify new biomarkers that can inform physician and patient decisions about treatment. These methods are also readily transferable to analysis of other time-to-event outcomes in disease progression, as well as similar problems in genetic association analysis of disease susceptibility which aims to identify genetic variants related to age at onset of complex disease status. Furthermore, the proposed methodology for MC modelling can be generalized as a general approach for analysis of time-to-event cohorts with an event-free fraction in other fields. Overall, this work strengthens the statistical tools available for analyzing incomplete survival data and helps produce evidence that is more informative for researchers, clinicians, and patients.
More Details
