Every few days, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from the Canadian Journal of Statistics, with the full article now available to read in issue 48:2 here.
Shu, H. and Tan, Z. (2020), Improved methods for moment restriction models with data combination and an application to two‐sample instrumental variable estimation. Can J Statistics, 48: 259-284. doi:10.1002/cjs.11530
Imagine a situation where an economist would like to draw inference about a population of interest by analyzing a dataset or, in statistical jargon, a sample from that population. Some variables needed are not present in the dataset, but she finds a second dataset which contains the needed variables as well as other relevant variables included in the first dataset. The two datasets are collected from possibly different populations, and cannot in general be linked by individual identifiers. A recent article by Heng Shu, a former PhD student, and Zhiqiang Tan, Professor of Statistics, from Rutgers University developed useful statistical methods, which the economist can employ to conduct the inference by combining the two datasets. One of the main ideas of the methods is to leverage the common variables present in both datasets, while taking into account the fact that these variables may be distributed differently between the two datasets, drawn from distinct populations. Previous methods in statistics and econometrics often make a homogeneity assumption that the two datasets are random samples from the same population. Moreover, the methods are designed to be not only robust to possible misspecification of statistical models used in performing the data combination, but also accurate in reducing associated standard errors of the estimates obtained. As an empirical application, Heng and Tan reanalyzed an econometric study, originally by Currie & Yelowitz, on the effects of participation in public housing projects by combining Census data and Current Population Survey (CPS), a widely used survey dataset in USA. They reached a similar conclusion, but more robust in relaxing the homogeneity assumption.