Small Area Estimation
Small area estimation (SAE) is the process of using statistical models to link survey outcome variables, such as disease or substance use indicators, to local area predictors, such as county demographic and socioeconomic (SES) variables, to predict local area disease or substance use prevalence rates. The 'areas' in small area estimation may be defined by geographical domains such as a state or county and by socio-demographic characteristics such as income, race, age, or gender subgroups.
Social indicator variables such as age, race/ethnicity, gender, education, income, family structure, employment status, etc., are commonly used to define high-risk subpopulations for targeting health promotion and disease prevention. Relating health status, behavior, and disease prevalence statistics for small areas to these demographic and SES predictors provides a direct calibration of the indicators to the outcomes of interest. SAE methods can be applied to cases where the number of area-specific sample observations is not large enough to produce reliable direct estimates.
Methodologies/Capabilities
We have developed an innovative survey weighted hierarchical Bayes (SWHB) solution for fitting mixed logistic models. The SWHB method for fitting mixed logistic models offers the following benefits over other hierarchical Bayes solutions:
- SAEs for large sample areas are close to their design-based analogs; hence, they are robust against model mis-specification.
- Aggregates (national) of lower-level (state, county) estimates are design-consistent and approximately benchmarked to the robust design-based estimates.
- Person- or unit-level predictors as well as aggregate-level predictors can be used in the model, making SWHB SAEs internally consistent and more precise.
- We have geographic information mapping capabilities to graphically display disease incidence, substance use rates, or other such small area estimates.
- We have used the Fay-Harriot SAE methodology for projects requiring less sophistication and rapid turnaround.
- Estimating function Gaussian likelihood (EFGL) methodology is being developed to support more general mixed models for SAE, including continuous outcomes.
Projects
- National Survey on Drug Use and Health (NSDUH). The SWHB solution has been implemented on data from the U.S. Substance Abuse and Mental Health Administration's (SAMHSA) National Household Survey on Drug Use and Health (NSDUH) to produce state level and sub-state level (groups of counties or census tracts) SAEs for more than 20 binary outcomes related to substance use, treatment, and mental health. These estimates are being used for treatment planning purposes by the states.
- Diabetes and smoking study. RTI statisticians have worked on a research grant awarded to the UNC Center for Health Statistics Research by the Centers for Disease Control and Prevention to produce county-level prevalence estimates of diabetes and smoking for all counties in North Carolina using behavioral risk factor surveillance system (BRFSS) data. These estimates based on the 1996 to 2002 BRFSS data were used to identify high-risk areas (counties or groups of counties) at which prevention and other health intervention programs can be directed.
- Travel study. SAE methodology was used to produce state-level prevalence rates for high daily person miles of travel and associated prediction intervals (PIs) for all 50 states and the District of Columbia, using the 2001 National Household Transportation Survey (NHTS). High daily person miles of travel were defined as "more than 87.5 miles traveled in a day" (which is the 90th percentile for daily person miles traveled). This project was funded by the Bureau of Transportation Statistics (BTS).