Estimating the prevalences of rare traits from complex survey data


  • Author: Noorie Hyun, Joseph J. Gastwirth and Barry I. Graubard
  • Date: 15 March 2019
  • Copyright: Image copyright of Patrick Rhodes

In a new paper written for Statistics in Medicine, the authors describe how originally, 2‐stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1‐stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1‐stage group testing to surveys with sample weighted complex multistage‐cluster designs. Sample weighted‐generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests.

The paper is available via the link here and the authors explain their findings in further detail below:

Grouping methods for estimating the prevalences of rare traits from complex survey data that preserve confidentiality of respondents

Noorie Hyun, Joseph J. Gastwirth and Barry I. Graubard

Statistics in Medicine, Volume 37, Issue 13, 15 June 2018, pages 2174-2186

thumbnail image: Estimating the prevalences of rare traits from complex survey data

In order to reduce the costs of screening recruits to the army in World War II, Dorfman proposed a two-stage process: First, the samples from a number, e.g. 10, of individuals was tested. If the test did not indicate the presence of syphilis, then the entire group passed. Otherwise, the members of the group were tested individually. Many refinements of the method that reduced the total number of tests needed identify individuals with a trait or disease were developed. When HIV/AIDS virus became a major health problem, an estimate of its prevalence in the population was needed.

Due to the stigma attached to the disease as well as the risk of losing health insurance, in order to preserve the privacy of individuals, statistical estimates of prevalence obtained from testing grouped or pooled samples without retesting were developed. Subsequently, similar group testing methodology was developed for estimating the prevalence of one or more cancer genes, e.g. BRACA, without testing any individual. This article explores procedures for extending group testing of blood samples collected in major health surveys (e.g. NHANES) in order to maintain the privacy and confidentiality of respondents. Because of the complex design of these surveys, several problems arise. (1) How should the fact that individual respondents no longer have the same weight, due to differential sampling, be incorporated into the pooling procedure and resulting estimate? (2) How should the survey design, especially the geographically clustering of the individuals sampled be incorporated in order to estimate the prevalence of a trait and its standard error accurately?

Five grouping methods that account for these issues are examined in this article. Recommendations for choosing which one to use are provided. The choice depends on whether one is estimating the prevalence of one or two traits, how correlated the individual-level sample weights are with having the trait, and for tests that are not 100% accurate how well the sensitivity and specificity of the tests are known. The procedures are illustrated by reanalyzing the oral samples collected in NHANES that were assayed for HPV positivity to estimate the prevalence of Human papillomavirus positive individuals in the U.S population

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.