RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
Advantages of imputation vs. data swapping for statistical disclosure control
Kinney, S. K., Looby, C. B., & Yu, F. (2020). Advantages of imputation vs. data swapping for statistical disclosure control. Lecture Notes in Computer Science, 12276, 281-296. https://doi.org/10.1007/978-3-030-57521-2_20
Data swapping is an approach long-used by public agencies to protect respondent confidentiality in which values of some variables are swapped with similar records for a small portion of respondents. Synthetic data is a newer method in which many if not all values are replaced with multiple imputations. Synthetic data can be difficult to implement for complex data; however, when the portion of data replaced is similar to data swapping, it becomes simple to implement using publicly available software. This paper describes how this simplification of synthetic data can be used to provide a better balance of data quality and disclosure protection compared to data swapping. This is illustrated via an empirical comparison using data from the Survey of Earned Doctorates.