Layman’s abstract for Canadian Journal of Statistics article on Dummy endogenous treatment effect estimation using high-dimensional instrumental variables

Each week, we publish layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
 
The article featured today is from the Canadian Journal of Statistics with the full article now available to read here.
 
Zhong, W., Zhou, W., Fan, Q. and Gao, Y. (2022), Dummy endogenous treatment effect estimation using high-dimensional instrumental variables. Can J Statistics. https://doi.org/10.1002/cjs.11648
 
Computational social science and policy evaluation often encounter the issues of non-experimental data. The endogeneity issue caused by individual units selecting into a favorable (at least what they believe) program or unobserved confounders that affect both the outcome and treatment variables could cause bias in the treatment effect estimation. Instrumental variable method is the signature solution to this issue developed in economics and has seen applications in business administration, genetical genomics, and general causal inferences. A good instrumental variable should be able to explain the variation of the treatment decision and affect the outcome variable only through the channel of the treatment. In this paper, utilizing on the availability of large dataset with rich features of individuals (such as large survey data, administrative data, data crawled from Internet platform), the authors develop a two-stage approach to estimate the treatment effects of dummy endogenous variables using high-dimensional instrumental variables (IVs). In the first stage, instead of using a conventional linear reduced-form regression to approximate the optimal instrument, they propose a penalized logistic reduced-form model to accommodate both the binary nature of the endogenous treatment variable and the high dimensionality of the instrumental variables. In the second stage, they replace the original treatment variable with its estimated propensity score and run a least-squares regression to obtain a penalized Logistic-regression Instrumental Variables Estimator (LIVE). They show theoretically that the proposed LIVE is root-n consistent with the true treatment effect and asymptotically normal. Monte Carlo simulations demonstrate that the LIVE is more efficient than existing high-dimensional IV estimators for endogenous treatment effects. In applications, they use the LIVE to investigate whether the Olympic Games facilitate the host nation’s economic growth and whether home visits from teachers enhance students’ academic performance. In addition, the R functions for the proposed algorithms have been developed in an R package, naivereg.
 
More Details