Layman’s abstract for paper on a measurement error perspective on the impact of predictor measurement heterogeneity across settings on the performance of prediction models

Every few days, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.

The article featured today is from Statistics in Medicine and the full article, published in issue 38.18, is available to read online here.

Luijken, K, Groenwold, RHH, Van Calster, B, Steyerberg, EW, Smeden, M. Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective. Statistics in Medicine. 2019; 38: 3444– 3459. doi: 10.1002/sim.8183

Clinical prediction models are mathematical tools that assist physicians and inform patients by providing individualized estimates of the probability (risk) for the presence of a disease or health state in the future. The performance of clinical prediction models in estimating these risks should (in principle) be studied in patients that were not part of the data in which the model was derived, known as validation. The quality of the risk predictions at validation can be hampered when predictive factors (known as predictors) are measured differently at validation than in the setting where the model was derived. For instance, derivation and validation settings may differ in measurement protocols or in diagnostic tests that are used (e.g. diagnostic tests produced by different manufacturers). Take, for example, the predictor bodyweight, which can be measured using a well-calibrated weighting scale administered by a nurse at derivation and using a self-reported measure at validation. Differences in measurement strategies across settings of derivation and validation are referred to as measurement heterogeneity.

Although heterogeneity in predictor measurement across derivation and validation data is common, the impact on risk estimation performance was not well studied. Therefore, several scenarios of predictor measurement heterogeneity were studied using simulated data. The results indicate that predictor measurement heterogeneity can induce miscalibration of risk prediction and can affect discrimination and overall predictive accuracy of a prediction model, to extents that the prediction model may no longer be considered clinically useful.

To understand and anticipate the impact of predictor measurement heterogeneity, it is helpful to consider it from a measurement error perspective. This approach sheds a different light on the relevance of predictor measurements in the context of prediction research. Measurement error is commonly thought not to affect the validity of prediction models, based on the general idea that unbiased associations between predictor and outcome are no prerequisite in prediction studies. The measurement heterogeneity approach revealed that prediction research requires consideration of variation in measurement procedures across different settings of derivation and validation, rather than analyzing the amount of measurement error within a study. Heterogeneity in predictor measurement procedures across settings can be considered an important driver of unanticipated predictive performance at external validation.

Preventing measurement heterogeneity at the design phase of a prediction study, both in derivation and validation studies, facilitates interpretation of predictive performance and benefits the transportability of the prediction model to new patient groups. Ideally, prediction models are derived from and validated on datasets collected with measurement strategies that are widely used in the intended clinical setting. Data collection protocols that reduce measurement error to a minimum do not necessarily benefit the performance of the model as the precision of measurements will most likely not be obtained in validation (or application) settings. Finally, it is important to clearly report which measurement procedures were used for derivation or validation of a prediction model.