Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models – lay abstract

The lay abstract featured today (for Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models by David Selby, Yuichiro Iwashita, Kai Spriestersbach, Mohammad Saad, Dennis Bappert, Archana Warrier, Sumantrak Mukherjee, Koichi Kise, Sebastian Vollmer) is from Stat with the full article now available to read here.

How to Cite

Selby, D., Iwashita, Y., Spriestersbach, K., Saad, M., Bappert, D., Warrier, A., Mukherjee, S., Kise, K. and Vollmer, S. (2025), Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models. Stat, 14: e70054. https://doi.org/10.1002/sta4.70054

Lay Abstract

Large language models (LLMs), such as ChatGPT, are widely used to generate text, summarise information and even assist with coding. But can they also provide useful numbers? This study explores whether LLMs can act as “expert” sources of quantitative knowledge—helping statisticians fill in missing data and inform Bayesian models.

Bayesian statistics is a powerful approach to data analysis that combines existing knowledge with new evidence. A key component of this approach is the prior distribution, which represents what is believed about an unknown quantity before observing any data. For example, a doctor estimating the effectiveness of a new drug might use prior knowledge from previous studies before analysing new trial results. Typically, these priors are obtained by consulting domain experts, but this process can be slow, expensive and subject to personal bias—different experts may give different answers based on their own experiences. Our research examines whether LLMs, trained on vast amounts of scientific literature, can provide reasonable priors when prompted with structured questions. We compare their outputs to priors obtained from human experts.

Another common challenge in data analysis is missing information—gaps in datasets that can arise from errors, privacy restrictions or incomplete records. We investigate whether LLMs can help by intelligently estimating these missing values, using contextual clues from the dataset itself. Testing on a diverse range of real-world datasets, we evaluate how well LLM-based imputation compares to traditional statistical methods.

Our findings suggest that while LLMs can sometimes provide insightful priors and reasonable imputations, they also introduce biases and inconsistencies. Their knowledge is shaped by the data they were trained on, which may not always align with reality. This raises important questions about their reliability as quantitative experts and highlights the need for careful validation when using them in statistical workflows.

By exploring the potential and limitations of LLMs in these tasks, our study contributes to ongoing discussions about AI’s role in data science. While LLMs offer exciting possibilities, they are not yet a substitute for human expertise—at least when it comes to numbers.

 

 

More Details