Each week, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from the Canadian Journal of Statistics, with the full article now available to read in Early View here.
Che, M., Lawless, J.F. and Han, P. (2020), Empirical and conditional likelihoods for two‐phase studies. Can J Statistics. doi:10.1002/cjs.11566
Studies in medicine, economics, public health and other areas involve cost constraints that limit the collection of data on variables that are expensive or difficult to measure. Examples include certain types of genomic measurements, expensive imaging or diagnostic testing, and data that require extensive face-to-face interviews with study subjects. Two-phase studies employ an approach in which less expensive variables are first measured on a large group of individuals (phase 1). This is followed by the collection of expensive measurements on a smaller sample of the individuals (phase 2). Information about the expensive variables can be maximized by basing the selection of persons for phase 2 on values of the phase 1 variables. For example, suppose researchers are studying risk factors for heart disease and the phase 1 data include age, sex, body mass index, and whether a person suffers from heart disease; in phase 2 expensive genetic factors are to be measured. It is then advantageous to over-sample older persons, those with heart disease and those with higher body mass index. Statistical methods that apply to random samples from a population require modification in this setting. The authors of this paper study and compare methods for estimating the effects of potential risk factors in two-phase studies, which in the case of heart disease would include genetic variables as well as age, sex and body mass index. The methods considered use estimating functions and empirical likelihood. The major contribution of the paper is in showing that an estimating function technique and a theoretically more informative empirical likelihood technique are in fact equivalent in this setting. This has important practical ramifications because the estimating function methods involve simpler computation and are much easier to implement.