Layman’s abstract: Empirical likelihood confidence intervals under imputation for missing survey data from stratified simple random sampling

Every few days, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.

The article featured today is from Canadian Journal of Statistics: ‘Empirical likelihood confidence intervals under imputation for missing survey data from stratified simple random sampling’ by Song Cai, Yongsong Qin, J.N.K. Rao and Malgorzata Winiszewska, published in issue 47:2.

Missing observations are commonly encountered in data collected from sample surveys due to nonresponse. Both unit nonresponse and item nonresponse are encountered in practice. A unit nonresponse element is one for which information is missing on all questionnaire variables; an item nonresponse element is one for which information is missing on at least one, but not all, of the questionnaire variables. Focus of this paper is on item nonresponse which is often handled by filling in missing values, called imputation. An advantage of imputation is that it produces complete data sets for public use and standard complete data programs for survey data can be used to provide estimates of population parameters such as means. In particular, we consider random hot deck single or fractional imputation which randomly selects respondent (donor) values to fill in the recipient missing values. In the case of single imputation a single value is imputed for a missing item value, whereas in the case of fractional imputation more than one value is imputed and a fraction of the survey weight is associated with each imputed value. For validity and accuracy of inference based on imputed data, the sample is grouped into homogeneous classes, called imputation classes, according to auxiliary variables that are observed for all the sample units. The paper aims to construct confidence intervals for a population parameter defined as the solution to a smooth estimating equation based on missing item data collected from stratified simple random sampling. To handle missing values, random imputation (single or fractional) is used within homogeneous imputation classes that are formed across strata. A nonparametric empirical likelihood inference method based on the imputed data is developed and its properties in large samples are derived. To construct confidence intervals for the population parameter of interest, two adjusted bootstrap resampling methods based on the empirical likelihood are proposed. We show that the traditional bootstrap methods which are widely used in the literature are incorrect in large samples when the number of imputed values used to fill in a missing value in fractional imputation is small or when single imputation is used, while the proposed bootstrap methods are always correct in large samples. A simulation study shows that the proposed bootstrap methods outperform the traditional bootstrap methods and some non-bootstrap competitors under various simulation settings.

The full article is available to read online here.