Each week, we will be publishing lay abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
Medical researchers develop statistical models to predict patients’ risks of early death or other unfavorable outcomes. By identifying patients with higher risks, clinicians can design additional treatment interventions to prevent adverse events within this population. In practice, the predictive ability of these models must be assessed to ensure that they are useful for informing treatment decisions. One statistic, known as the concordance index or “C-Index,” is a widely used measure evaluating the model’s ability to correctly predict higher risk levels for patients observed to have adverse events. Reviewers of medical journals use the C-Index in editorial decisions and often require a high C-Index to publish a predictive risk model. Furthermore, influential guidelines for clinicians and analysts have arbitrarily suggested that predictive models must have a C-Index greater than 0.7 to be used in clinical settings. However, in this paper, the authors show that the C-Index has important pitfalls which apply to models with binary outcomes (e.g., sick vs. healthy) but are accentuated for models with continuous time-to-event outcomes (e.g., patient survival times). For continuous time-to-event outcomes, model assessments based on the C-Index involve many comparisons between patients with very minor differences in risk that are not clinically meaningful. This is because patients can experience outcomes at nearly identical but distinct timepoints. The authors demonstrate through several examples that it is difficult to achieve a high C-Index value even for very useful time-to-event models, and many of these models would be discarded based on the current C-Index guidelines. Finally, they present recommendations on how to select and apply more appropriate model evaluation methods from the existing literature including calibration statistics, net benefit measures, decision curves, and discriminant analyses.