• Report

Constructing strata of primary sampling units for the Residential Energy Consumption Survey

Citation

Harter, R., Chen, P., McMichael, J., Cureg, E., Adeshiyan, S., & Morton, K. (2017). Constructing strata of primary sampling units for the Residential Energy Consumption Survey. (RTI Press Publication No. OP-0041-1705). Research Triangle Park, NC: RTI Press. DOI: 10.3768/rtipress.2017.op.0041.1705

Abstract

The 2015 Residential Energy Consumption Survey design called for stratification of primary sampling units to improve estimation. Two methods of defining strata from multiple stratification variables were proposed, leading to this investigation. All stratification methods use stratification variables available for the entire frame. We reviewed textbook guidance on the general principles and desirable properties of stratification variables and the assumptions on which the two methods were based. Using principal components combined with cluster analysis on the stratification variables to define strata focuses on relationships among stratification variables. Decision trees, regressions, and correlation approaches focus more on relationships between the stratification variables and prior outcome data, which may be available for just a sample of units. Using both principal components/cluster analysis and decision trees, we stratified primary sampling units for the 2009 Residential Energy Consumption Survey and compared the resulting strata.

Author Details

Rachel Harter

Rachel M. Harter, PhD, is a senior research statistician at RTI International, Division for Statistical & Data Sciences.

Joseph McMichael

Joseph P. McMichael, BS, is a research statistician at RTI International, Division for Statistical & Data Sciences.

Edgardo Cureg

Edgardo S. Cureg, PhD, is a lead mathematical statistician at the US Energy Information Administration, Washington, DC.

Samson Adeshiyan

Samson A. Adeshiyan, PhD, is chief statistician at the National Science Foundation, Arlington, VA. This work was completed when he was a lead mathematical statistician at the US Energy Information Administration, Washington, DC.