The 2015 Residential Energy Consumption Survey design called for stratification of primary sampling units to improve estimation. Two methods of defining strata from multiple stratification variables were proposed, leading to this investigation. All stratification methods use stratification variables available for the entire frame. We reviewed textbook guidance on the general principles and desirable properties of stratification variables and the assumptions on which the two methods were based. Using principal components combined with cluster analysis on the stratification variables to define strata focuses on relationships among stratification variables. Decision trees, regressions, and correlation approaches focus more on relationships between the stratification variables and prior outcome data, which may be available for just a sample of units. Using both principal components/cluster analysis and decision trees, we stratified primary sampling units for the 2009 Residential Energy Consumption Survey and compared the resulting strata.
By Rachel Harter, Patrick Chen, Joseph McMichael, Edgardo Cureg, Samson Adeshiyan, Katherine Morton.
May 2017 Open Access Peer Reviewed
© 2023 RTI International. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
To contact an author or seek permission to use copyrighted content, contact our editorial team