Creating imputation classes using classification tree methodology
Virtually all surveys encounter some level of item nonresponse. To address this potential source of bias, practitioners often use imputation to replace missing values with valid values through some form of
stochastic modeling. In order to improve the reliabilities of such models, imputation classes are formed to produce homogenous groups of respondents, where homogeneity is measured with respect to the
item that will be imputed. A common method used to form imputation classes is Chi-squared Automatic Interaction Detection (CHAID) where the splitting rule is based on Chi-squared tests. This paper examines an
alternative methodology used to form imputation classes, nonparametric classification trees where the splitting rules are based on the Gini index of impurity, which is one possible splitting rule used in
Classification and Regression Trees (CART). In addition to a brief description of the two classification tree methodologies, we provide some comparative
examples using simple generated data and real data. Finally, we use the imputation classes with three imputation procedures: mode value imputation, proportional random imputation, and weighted
sequential hot-deck. To provide an additional comparison, we model the item nonresponse using logistic regression or polychotomous regression.
Creel, D., & Krotki, K. (2006). Creating imputation classes using classification tree methodology. Proceedings of the Survey Research Methods Section (ASA), 2884-2887.