• Article

Analysis of Large Health Surveys: Accounting for the Sampling Design

Large scale health surveys offer an opportunity to study associations between risk factors and outcomes in a population-based setting. Their complicated multistage sampling designs with differential probabilities of sampling individuals can make their analysis unstraightforward. Classical 'design-based' methods that yield approximately unbiased estimators of associations and standard errors can be highly inefficient. Model-based methods require assumptions which, if wrong, can lead to biased estimators of associations and standard errors. This paper examines the implications of utilizing the sample clustering and sample weights in the analysis of survey data. The approach is to estimate the inefficiency of using these aspects of the sampling design in a design-based analysis when actually it was unnecessary to do so. If the inefficiency is small, then that aspect of the design is used in a design-based fashion. Otherwise, additional modelling assumptions are incorporated into the analysis. By focusing attention on risk factor-outcome associations in large health surveys, specific recommendations for practitioners are given. The issues are demonstrated with real survey data including two controversial analyses previously published in medical references


Korn, EL., & Graubard, BI. (1995). Analysis of Large Health Surveys: Accounting for the Sampling Design. Journal of the Royal Statistical Society. Series A (Statistics in Society), 158(2), 263-295.