The lay abstract featured today (for Evaluating Biases in Large Language Models Over Time: A Framework With a GPT Case Study on Political Bias by Meltem Aksoy, Erik Weber, Jérôme Rutinowski, Niklas Jost and Markus Pauly) is from Applied Stochastic Models in Business and Industry with the full Open Access article now available to read here.
How to cite
, , , , and , Evaluating Biases in Large Language Models Over Time: A Framework With a GPT Case Study on Political Bias, Applied Stochastic Models in Business and Industry 42, no. 2 (2026): e70078, https://doi.org/10.1002/asmb.70078.
Lay Abstract
For the last few years, Large Language Models such as ChatGPT have been widely used to search for information, write text, and support decision-making in everyday life. Since these systems are trained on vast amounts of human-generated data, they can reflect social and political biases that exist in society. A key challenge, however, is that commercial language models are updated frequently and often without clear public documentation. This means that conclusions about a model’s behavior today may no longer hold tomorrow. Despite this reality, most existing research studies still examine only a single version of a model at one point in time.
This article addresses that gap by proposing a systematic way to track how biases in language models change over time. Rather than focusing on technical model details, the authors introduce a practical and transparent framework that allows researchers and practitioners to compare different model versions fairly and consistently. Political bias is used as an illustrative example because it is socially relevant, measurable, and frequently debated in public discussions about artificial intelligence.
Using several versions of ChatGPT, the study analyzes thousands of generated responses to political statements and viewpoints. The results show that political tendencies do shift across model updates: newer versions appear less strongly aligned with left-leaning positions, yet they still reproduce recognizable ideological and personality patterns. Importantly, the study also demonstrates that models can convincingly imitate specific political perspectives when instructed to do so, raising questions about neutrality, influence, and trust.
Overall, this work highlights why one-time evaluations of AI systems are no longer sufficient. As language models increasingly shape public discourse, education, and policy-related information, ongoing and transparent monitoring of their biases becomes essential. The proposed framework offers a foundation for such long-term oversight and can be applied well beyond political bias to other socially important dimensions.
