Double Protection for Imputation Using a Tree-based Methodology
Creel, D. (2012, July). Double Protection for Imputation Using a Tree-based Methodology. Presented at JSM 2012, Dan Diego, CA.
Double protection means that if either the prediction model or response propensity model is correct we will get approximately unbiased results. Typically, the models proposed for predication and response propensity are some type of regression model. In this paper, we use the double protection concept for imputation where the models are based on classification and regression trees. We use one tree to create the imputation classes from the prediction model and another tree to create the imputation classes from the response propensity model. Once the two sets of imputation classes are created, we group all the observations, both respondents and nonrespondents, based on these cross-classified imputation classes. Within these cross-classified imputation classes, we use a hot deck methodology to impute. We employ a series of Monte Carlo simulations with various patterns of missingness to compare the empirical biases resulting from using, (1) the cross-classified imputation classes, (2) imputation classes derived from a prediction model only, (3) imputation classes derived from a response model only, and (4) imputation based on a linear model.