Krotki, K. (2005, August). Mass Imputation. Presented at Joint Statistical Meetings, Minneapolis, MN.
The vast majority of the very large body of literature on imputation for missing data focuses on the task of imputing single variables. The challenge of imputing many variables simultaneously is less well discussed and understood. With increasing pressure on data producers to fill in “gaps” in the data rather than leaving this up to the users, the task of imputing data can translate into the need to impute literally hundreds of variables. Several ad hoc methods are known and used but there is a need to develop a more formal treatment of this methodology. Whether the basic imputation method is deterministic or stochastic, such as hot-deck, there are principles that can apply to make the process efficient and effective. In this paper we outline the problems that are faced when doing mass imputation, we suggest a series of solutions and guidelines, and we discuss how some of these strategies are applied specifically in the case of a large-scale national education survey—2004 National Postsecondary Student Aid Survey.