Layman’s abstract for Statistics in Medicine tutorial on formulating causal questions and principled statistical answers

Each week, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.

The Tutorial in Biostatistics article featured today is from Statistics in Medicine, with the full Open Access article now available to read here.

Goetghebeur, Ele Cessie, SDe Stavola, BMoodie, EEWaernbaum, IFormulating causal questions and principled statistical answersStatistics in Medicine2020; 4922-4948. https://doi.org/10.1002/sim.8741

Causal inference has become a popular and important framework for generating data driven evidence to inform treatment decisions and more general health interventions. This brought an explosion of statistical methods of growing complexity targeting a range of different estimands. Keeping up with the literature and understanding what exactly has been estimated under which assumptions is a challenge for practicing statisticians and researchers alike. Most methods, however, evolved from similar basic principles and methods. Insight into these principles leverages insight into the derived methods.

While several review papers on causal inference methods are now available, there is little discussion at a fundamental or introductory level on what causal inferences can provide, or guidance in choosing one particular approach. This tutorial demonstrates 
how relevant causal questions can be posed, showing the need to be specific about the possible exposure levels (“treatments”) and populations about which the question is asked. Using a framework which considers potential outcomes under different, competing exposure levels, the tutorial describes principled definitions of causal effects, and explains different estimation approaches.

Situations where an exposure of interest is set at a chosen baseline (‘point exposure’) and the target outcome arises at a later time point are considered, focusing mainly on continuous outcomes and causal average treatment effects. The estimation methods
presented rely either on assuming that there is no unmeasured confounding, meaning observed exposure can be treated as randomized after conditioning on measured variables predictive of outcome, or assuming that an instrumental variable exists,
which acts like a pseudo random variable and allows to even have unmeasured confounders. In particular, outcome regression, stratification and matching on the propensity score, inverse probability weighting, and the doubly-robust hybrid approach that uses both regression and weighting together are considered as methods that rely on no unmeasured confounding. A standard instrumental variables approach and its necessary assumptions are also presented. It illustrates how a variable that serves as an
instrument for one particular exposure may not serve as an instrument for another, ‘downstream’ exposure, even if those two exposures are correlated.

This tutorial provides in-depth consideration of the interpretation of causal estimands, challenges and potential pitfalls of different analytic approaches. To support this, it introduces the so called ‘simulation learner’ that has generated ‘observed’ exposures, covariates, and outcome data, as well as a set of possible alternative exposures for each subject along with their corresponding potential outcomes. Having each person’s response to each set of exposures allows the calculation of the true values of several causal effects; here in a setting inspired by a (real) breastfeeding encouragement trial. The simulation learner is thus designed to help evaluate the effect of various breastfeeding interventions – capturing different scientific questions – on a child’s later development. With the gold standard estimands available (i.e. the true values derived from the various potential outcomes for all individuals), one can calculate several estimators from the observable data and compare their performance.

Code is provided in R, SAS, and Stata so that new learners can use this tutorial to start from formulating the question through to the analysis and appropriate interpretation, with all relevant steps along the way. Additional supportive material such as slides
and practicals with code and answers for teaching purposes can also be found on the website ofcaus.org.

 

More Details