Instability of the AUROC of Clinical Prediction Models – lay abstract

The lay abstract featured today (for Instability of the AUROC of Clinical Prediction Models by Florian D. van Leeuwen, Ewout W. Steyerberg, David van Klaveren, Ben Wessler, David M. Kent and Erik W. van Zwetis from Statistics in Medicine with the full Open Access article now available to read here.

How to cite

van Leeuwen, F.D., Steyerberg, E.W., van Klaveren, D., Wessler, B., Kent, D.M. and van Zwet, E.W. (2025), Instability of the AUROC of Clinical Prediction Models. Statistics in Medicine, 44: e70011. https://doi.org/10.1002/sim.70011

Lay Abstract

Clinical prediction models (CPMs) can help clinicians estimate risks for diagnosing or predicting medical conditions. After a new CPM has been developed, it should be validated in independent data that may arise from other settings, such as other hospitals in other countries. Such “external validation” allows us to see the variation in the CPM’s performance due to differences in patient populations, standards of care, and how medical data are recorded.

To quantify the variation in the performance of CPMs across different settings, we analyzed the AUROC estimates of 469 CPMs in the Tufts-PACE CPM Registry that had at least one external validation. The AUROC is a number between 0 and 1 which measures how well a CPM can differentiate between patients with and without the (binary) outcome of interest. Loosely speaking, an AUROC less than 0.6 is considered poor while an AUROC greater than 0.8 is quite good. We found that the variation between the AUROCs of the same CPM across different settings is typically around 0.05. This means that a CPM with an AUROC of 0.7 on average, will vary from 0.7 – 2×0.05=0.6 (poor) to 0.7 + 2×0.05=0.8 (good) between different settings. This large uncertainty cannot be reduced by doing more external validations.

Since most CPMs have only 1 or 2 external validations, the variation in performance between settings cannot be observed reliably. We propose an empirical Bayes method which uses the information from the Tufts-PACE CPM Registry to provide a realistic uncertainty assessment. Most importantly, we stress the need to robustly validate (and if necessary update) a CPM in the setting where it is to be used.

 

More Details