Better Coverage Intervals for Estimators from a Complex Sample Survey

ii


Introduction
Statisticians are interested in estimating intervals likely to contain a parameter. Wald intervals are most commonly used for this purpose. A hypothesis test for the location of the parameter can be conducted using Wald intervals.
Suppose θ is a nearly (i.e., asymptotically) unbiased estimator for a parameter θ estimated from a complex probability sample. The one-sided Wald intervals at the α (e.g., 95%) level for θ are where v is a good estimator for V, the variance of θ, under either probability-sampling theory or a reasonable model, and Φ(.) is the cumulative distribution function of a standard normal distribution. It is well-known that when the sample size is large enough, both inequalities hold for roughly α percent of samples drawn using the same sampling design as the probability survey. That is because the random variable θ is asymptotically normal under mild conditions, and those conditions are assumed to hold.
A symmetric two-sided α percent Wald interval easily derivable from Equation 1 is In this paper, we will focus on one-sided intervals because creating a symmetric two-sided interval from two one-sided intervals is easily done, as just demonstrated. For similar reasons, we will not discuss hypothesis tests derived from coverage intervals.
Often, the sample size in an application will not be nearly large enough for a one-sided Wald interval to contain ("cover") θ with the frequency suggested by the asymptotic theory. We will use the term "coverage interval" here rather than "confidence interval" because one rarely has confidence that the true value of θ falls within the designated interval at least α percent of the time across repeated realizations of the random variable θ (as it would were θ normally distributed and Φ(.) replaced with the appropriate Student's t-distribution). Kott and Liu (2010) proposed using skewnessadjusted one-sided coverage intervals in place of the Wald intervals to "speed up the asymptotics" (i.e., be roughly correct for samples that are not very large). The next section describes those intervals. The section after that looks at intervals based on a stratified simple random sample and a stratified multistage sample with attention to intervals for a ratio and for the difference between two domain means. The next section shows how skewness adjustments can affect estimates produced from a stratified cluster sample. It is followed by a section providing additional comments on a range of topics, from coverage intervals for regression coefficients, to potentially useful approximations when constructing skewnessadjusted intervals is impractical, to intervals based on estimates computed with calibrated weights. Finally, the paper offers some concluding remarks.
Much of the research into the impact of slow asymptotic normality on coverage has concentrated on proportions either estimated from an independent and identically distributed (iid) sample (e.g., Clopper & Pearson, 1934;Hall, 1982;Newcombe, 1998;Brown, Cai, & Dasgupta, 2001;Cai, 2005) or a complex probability sample (e.g., Korn & Graubard, 1998;Franco, Little, Lewis, & Slud, 2014). Kott, Andersson, & Nerman (2001) demonstrated the close relationship between the two-sided version of the Anderson-Nerman interval and the Wilson (score) coverage interval for a proportion (Andersson & Nerman, 2000). Kott (2017) showed the relationship between the Wilson interval and the logistic-transformation approach to creating coverage intervals for proportions (described in , and elsewhere).
Here, we will look at more-general estimators computed from complex probability samples; in particular, we focus on estimators for ratios, differences between domain means, and regression coefficients. Critical to this endeavor will be estimating the third central moment of θ. We will use probability sampling (design-based) theory in the investigation. Analogous conclusions using modelsbased assumptions are straightforward.
from an Edgeworth expansion of θ, which is Kott and Liu's contribution to the intervals in Equation 2. A similar result for estimators based on iid samples can be found in Abramovitch and Singh (1985). Note that although a nonzero δ in Equation 2 expands the size of Kott and Liu's skewness-adjusted coverage interval slightly, its primary impact is to move the interval, which can be in either a positive or negative direction.
If ≈ 3 / b m v , which is true in many contexts (as we shall see in the next section), then, Kott and Liu noted, When Equation 4 holds, the coverage intervals in Equation 2 can be expressed as where τ = Kott and Liu conjecture that |τ| should be less than 1 for their skewness-adjusted intervals to be effective. For Wald coverage intervals, τ needs to be close to 0. Cochran (1977, p. 42) suggests |τ| should be less than 0.2. (Cochran's original suggestion is for simple random sampling. We expand it here.) The skewness, τ, of an estimator tends to decrease in absolute value as the sample size increases (the central limits theorem tells us that it converges to zero as the sample size grows arbitrarily large). For an estimated proportion, p, under either simple random sampling with replacement or an iid model, When v is not too close to zero, the following modification on Equation 5 drops terms of a smaller asymptotic order (rendering τ ≈ 2 0 ): These are the one-sided Wald intervals shifted by Next, suppose we are interested in constructing onesided coverage intervals for a finite-population total or mean based on a stratified simple random sample.

Stratified Simple Random Sampling
The former can be expressed as An unbiased estimator for the finite-population total T y using probability sampling theory is = Kott and Liu (2010) showed that when every stratum has at least three sampled members (i.e., n h ≥ 3), one can construct one-sided coverage intervals for Y based on probability-sampling theory by setting in Equations 2 and 3. The first two equalities in Equation 7 provide unbiased estimators for the second and third central moments of θ =y , and b is a consistent estimator for B under mild conditions we assume to hold (e.g., that those central moments exist). When all the 2n h / N h are small enough to be ignored, ≈ 3 / b m v , and Equation 3 can be approximated by Equation 4. Surprisingly, when 2n h / N h = 1, stratum h has no impact on m 3 . When 2n h < N h , the impact of stratum h on m 3 is in the opposite direction of It is important to realize that when one n h is less than 3, Equation 7 becomes useless. Useful approximations for m 3 and b become necessary. We discuss some in a later section ("Some Simple Approximations"), although there are other possibilities beyond those discussed.
The ratio of two totals, / x z T T , can be estimated in a consistent manner using data from a withoutreplacement stratified simple random sample by = In other words, the difference between A ratio estimator of special interest is the estimator of a domain mean. If d k = 1 for an element in the domain and 0 otherwise, then the estimated mean of y-values in the domain has the form =

A Stratified Multistage Sample
Consider now constructing a coverage interval for a population mean based on stratified multistage sample when a nearly unbiased estimator for that parameter can be put in the form where there are n h primary sampling units (PSUs) in stratum h, and each ˆh i t for a PSU i in stratum h is a nearly unbiased estimator for the same value. We will make the common (but often inaccurate) assumption that the PSUs were selected randomly but with replacement, while any subsampling was done using probability sampling principles.
A univariate component of an estimated linear regression coefficient can also be put into the form of Equation 8. We focus now on the difference between two domain means estimated using data from the same sample, S. Each element in S had a value y k and a sampling weight w k attached to it, so the estimated different in domains means can be expressed as follows: (1) (1) where ( ) 1 a k d = when k is in Domain a and 0 otherwise. Here: are independent under probability sampling theory for PSUs in the same stratum (recall we are assuming with-replacement sampling in the first stage of sample selection).
When all 3 h n ≥ , the following equalities can be used in Equations 2 and 3: where each e hi has the following linearized expression: (1) We ignore finite-population correction when comparing domain means because an analyst is usually interested in whether there is an underlying process causing the domain means to be different, not the actual difference between the means in the finite population. Strictly speaking, this assumes models are generating the domain models, but, following a probability-sampling framework, those domains are vaguely specified.

An Example
In this section, we will look at computing one-sided coverage intervals for two sets of parameters for the MU284 population from Särndal, Swennsson, & Wretman (1992), available at http://lib.stat.cmu. edu/datasets/mu284. The population consists of 284 Swedish administrative municipalities separated into 50 clusters with eight strata. We collapse the final two strata, creating seven strata total. We divide the population into two domains. The 26 municipalities with a 1985 population of more than 64,000 are in Domain 1, and the remaining 258 municipalities are in Domain 2. We are interested in constructing coverage intervals for (1) the arithmetic average across municipalities in 1985 of the municipal taxation per person within each domain and (2) the fraction of municipalities within each domain with more tax receipts than 9 million kronor per 1,000 persons in 1985. We are also interested in constructing coverage intervals for the differences between the domains.
We suppose a cluster sample of three clusters per stratum (n h = 3) are selected from the MU284 population via simple random sampling with replacement. Letting y k be either the tax revenue per person in municipality k or a (0/1) indicator of whether that ratio is greater than 9 million kronor per 1,000 persons, we define (1) In practice, the intervals are computed from the sample rather than the population and are added to estimates, which are likewise computed from the sample.
The symmetric Wald and asymmetric skewnessadjusted intervals in Table 1 tend to be closer to each other in Domain 2 than in Domain 1. The larger sample size in Domain 2 reduces the impact of skewness adjustment. The sizes of the coverage intervals for the differences tend to be dominated by the smaller Domain 1 samples.

Coverage Intervals for a Regression Coefficient
Suppose β is a vector of regression coefficients of y k on =  1, , where θ =θ We assume matrix A to exist while the components of NA have finite limits as the population of size N grows arbitrarily large. Strictly speaking, the estimator for a logistic regression coefficient is determined by solving the weighted estimating equation which cannot be expressed in the form of Equation (8). Nevertheless, the variance and third central moment of the estimator can be measured by implementing Equations 9 and 11.
Although the e hi in Equation 11 are not independent within strata, each is almost equal to a u hi , in which the β on the right-hand size of Equation 11 is replaced by β and the components of Ng j by their asymptotic limits. The u hi are independent in a probability-sampling sense under the assumption that the first-stage sample is drawn with replacement.
Observe that we can only create coverage intervals for one regression coefficient at a time. A coverage interval for a univariate linear combination or regression coefficients can be created using the same principles. To test a vector of r regression coefficients, one may need to use a conservative r-dimensional Bonferroni box rather than a Wald-based coverage ellipsoid.

Some Simple Approximations
There is a practical problem in computing m 3 , and consequently τˆ using either Equation 7 or 9: there is no available software routine to do so. Even if there were or a statistician wanted to program the equations, there may not be three PSUs in every stratum. Unlike collapsing strata for variance estimation, the direction of the potential bias of τˆ can be positive or negative when the population means of the strata being combined are different (the population means are either the expected value of the y k in each stratum in Equation 7 or the expected values of the u hi corresponding to the e hi for a particular h in Equation 9). Consequently, strata collapsed together should have equal (or nearly equal) expected population means.
A key to skewness-adjusted coverage intervals, especially when finite-population correction can be ignored, is the estimated value = 3 / b m v. From the last section, assuming a large sample, the value of this term for the difference between proportions estimated for two distinct domains from a simple random sample is approximately where p a is the estimated proportion in Domain a based on n a sampled elements being in Domain a.
When p 1 = p 2 , this collapses to These results are similar (for large n a ) to what we would get looking at the estimate of the two proportions as coming from independent simple random samples (which they are conditionally given their respective realized domain sample sizes): the estimated variance of the difference equals the sum of the individual estimated variances, whereas the estimated third central moment of the difference is the difference between the individual estimated third central moments.
To many, these results might suggest that when assessing the difference between proportions in two distinct domains estimated using a complex probability sample, one simply multiplies the domain sample sizes above by their respective design effects. Such a practice is not recommended, however, for two reasons. One, the design effect captures the effect of clustering and stratification on the variance of an estimator, not on its third central moment. Two, the unequal weighting effect, another component of the design effect, is not the same for the third central moment of an estimator and its variance.
A wiser procedure might be to estimate = 3 / B M V for an estimated proportion ∈ ∈ = ∑ ∑ k k k k S k S p w y w , where p estimates the fraction of the population with y k = 1 rather than 0, with and then inserting τ

Calibration Weighting and the Jackknife
So far, we have implicitly assumed either that w k is the inverse of the probability of selected element k into the sample or that the n h elements in stratum h were selected with equal probability in each stratum. We have ignored the effect of coverage error and unit nonresponse on the respondent sample ultimately used in estimation.
Under simple random sampling without replacement, it is common to assume that the list from which the sample has been drawn is complete and without duplication and that elements in the same stratum are either equally likely to respond (treating response as a phase of probability sampling) or have a common means whether or not they respond. This allows one to use the formulae in Equations 2 through 6, replacing the sample and stratum samples with the analogous respondent samples; n h is redefined accordingly.
For multistage sampling calibration, weighting can be used either to adjust for nonresponse or undercoverage (Kott, 2006) or simply to reduce the standard error of the estimates (Deville & Särndal, 1992). When w k is a calibrated weight, the e hi in Equation (8)  w d z q is the weightadjustment function connecting the weight before calibration to the calibrated weights (e.g., ξ(θ) can be + θ, θ) 1 exp( , or + θ)) 1 exp( ), and q is chosen so that ∈ ξ ∑ ( T k z k S w T z z q) = is the vector of population totals for the components of z k or an estimate for that vector calculated using information outside the respondent sample (e.g., from the frame; a larger, more-efficient sample than S ; or the full sample before nonresponse).
Calibration weighting often removes much of the impact of stratification and clustering from an estimated mean. For example, calibration by region can reduce the impact of stratification by geographical units, whereas calibration by race and ethnicity can reduce the impact of clustering within neighborhoods. As a result, estimating the skewness of an estimated proportion or mean using Equations 10 or 11 may not be unreasonable, although it would often be better to replace the y k with a calibrated residual. Moreover, when estimating domain means, the impact of calibration weighting, like that of stratification and clustering, is diminished, except for any impact caused by increased variability of the weights themselves. Calibration weighting makes the use of Equations 12, 13, or 14 within the intervals in Equation 5 or 6 more viable.
If calibrated jackknife weights have been constructed to compute v for an estimator t, then these weights Estimating the variance and third central moment of an estimator whose weights incorporate more than one calibration step can be difficult using the linearization methods applied previously. Computing jackknife measures is much simpler.

Some Concluding Remarks
Statisticians who base their inferences on probability samples often claim that those inferences are modelfree. One deviation from that claim is the assumption that the sample under study is large enough that an estimator based on the sample is approximately normally distributed. This assumption, which is often only made implicitly, justifies constructing Wald intervals for the parameter being estimated and conducting hypothesis tests based on those intervals.
This asymptotic assumption may not be justified in practice. Many alternatives to Wald intervals have been suggested in the literature for when the estimand is a proportion based on a simple random sample, Kott and Liu (2010) provide the means for constructing skewness-adjusted coverage intervals for estimators other than proportions. Moreover, these estimators can be based on more complex designs than simple random sampling. . We follow up on those intervals, describing interval estimates for ratios, differences between domain means, and regression coefficients based on either a stratified simple random sample or a stratified multistage probability sample employing with-replacement sampling at the first stage.
The practical stumbling block to using skewnessadjusted intervals is that the statisticians must estimate the third central moment of their estimator. This cannot be done if any stratum contains less than three PSUs. Several approximations have been offered here instead.
Simulations assessing the viability of those approximations are still needed, as are simulations of the skewness-adjusted intervals themselves with realistic data (a few simulations are presented in Kott and Liu [2009], for stratified simple random samples).
Here, the estimator's population-based third central moment was used in place of its estimate in the examples. In practice, a statistician will usually have to estimate a third central moment. The impact of this estimation, which may not be very stable, needs to be evaluated.
It may be that using skewness-adjusted intervals or their approximations is roughly equivalent to using Wald intervals for a particular application. If that is the case, then assuming asymptotic normality may be justified. The skewness-adjusted intervals discussed here can be used to make that justification.