The lay abstract featured today (for Bayesian Federated Inference for estimating statistical models based on non-shared multicenter data sets by Marianne A Jonker, Hassan Pazira and Anthony CC Coolen) is from Statistics in Medicine with the full Open Access article now available to read here.
Bayesian Federated Inference for estimating statistical models based on non-shared multicenter data sets. Statistics in Medicine. 2024; 1–18. doi: 10.1002/sim.10072
, , . Abstract
To reliably identify predictive factors for an outcome of interest via a multivariable regression analysis the data set must be large enough compared to the number of possible factors. In practice, sufficient data is often lacking. Using small data sets can lead to overfitting of the statistical model and, as a consequence, inaccurate estimates of the parameters in the model and unreliable predictions of the outcome of new patients. Combining data from different centers or data sets into a single (larger) database would alleviate this problem, but is in practice challenging due to regulatory and logistic problems. In the paper the authors describe a Bayesian Federated Inference (BFI) framework for multicenter data. It aims to construct from local inferences in separate centers what would have been inferred had the data sets been merged. It seeks to harvest the statistical power of larger data sets without actually creating them. The BFI framework is designed to cope with small data sets by inferring locally not only the optimal parameter values, but also additional features of the posterior parameter distribution. Importantly, a single inference cycle across the centers is sufficient for the BFI method, whereas most Federated Learning strategies needs multiple cycles across the centers. The performance of the proposed methodology is shown to be excellent. An R-package to do all the calculations has been developed and a user-friendly manual is available.
More Details