• Conference Proceeding

The weighted sequential hot deck imputation procedure


Cox, B. G. (1980). The weighted sequential hot deck imputation procedure. In Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 721–726. .


One of the most commonly used item nonresponse adjustment procedures is hot deck imputation which uses current survey responses to substitute for missing data. In using the hot deck procedure, the sample individuals are usually partitioned into categories or imputation classes. Having formed these classes, the data records are then ordered according to important variables which influence response. Based upon the present data set or previous data, the hot deck procedure stores initial values by class for each variable to be imputed. As the survey data are processed, the imputation class to which an individual belongs is determined. If the record being processed is complete with respect to the variable or set of variables to be imputed, then that individual's responses replace the responses stored for the relevant imputation class of the hot deck. When a record is encountered with a missing response for an item, the last response stored in the hot deck for the same class is imputed for the missing response. When all records have been processed and the missing data imputed, estimates are usually computed without accounting for the effect of the imputation procedure.

The procedure just described could be referred to as a sequential hot deck procedure since the data are first ordered and then the last reported value in the sequence is substituted for each missing value as the data are processed. The procedure is also unweighted in that the selection of a response for imputation purposes is independent of the sampling weight associated with the data record from which the response is taken and the data record to which a response is being imputed. However, ignoring sample weights implies that the distribution of responses within each imputation class of the imputation-revised data set may be distorted from that of the original distribution of responses.