Clarifying some issues in the regression analysis of survey data
The literature offers two distinct reasons for incorporating sample weights into the estimation of linear regression coefficients from a model-based point of view. Either the sample selection is nonignorable or the model is incomplete. The traditional sample-weighted least-squares estimator can be improved upon when the sample selection is nonignorable, but not when the standard linear model fails and needs to be extended.
Conceptually, it can be helpful to view the realized sample as the result of a two-phase process. In the first phase, the finite population is drawn from a hypothetical superpopulation via simple random (cluster) sampling. In the second phase, the actual sample is drawn from the finite population. In the extended model, the parameters of this superpopulation are vague. Meansquared-error estimation can become problematic when the primary sampling units are drawn within strata using unequal probability sampling without replacement. This remains true even under the standard model when certain aspects of the sample design are nonignorable.