Scandinavian Journal of Statistics

A smoothed Q‐learning algorithm for estimating optimal dynamic treatment regimes

Journal Article

Abstract In this paper, we propose a smoothed Q‐learning algorithm for estimating optimal dynamic treatment regimes. In contrast to the Q‐learning algorithm in which nonregular inference is involved, we show that, under assumptions adopted in this paper, the proposed smoothed Q‐learning estimator is asymptotically normally distributed even when the Q‐learning estimator is not and its asymptotic variance can be consistently estimated. As a result, inference based on the smoothed Q‐learning estimator is standard. We derive the optimal smoothing parameter and propose a data‐driven method for estimating it. The finite sample properties of the smoothed Q‐learning estimator are studied and compared with several existing estimators including the Q‐learning estimator via an extensive simulation study. We illustrate the new method by analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness–Alzheimer's Disease (CATIE‐AD) study.

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.