Verifying data migration correctness: The checksum principle


Data migration, generally referred to as the process of reading data from their source and inserting them into a target database, is an important element of data extract, transform, and load (ETL) systems. During data migration, errors can occur during data transmission. These errors can directly affect the quality of the data in the target database. Therefore, verifying the correctness of the outcome is a critical component of a data migration operation. Current methods in data migration correctness verification have many limitations, including incompleteness and inaccuracy. This paper describes an innovative method that applies the well-proven checksum methodology to verify the correctness of the data migration outcome. This method performs a thorough and accurate verification on the correctness of the migrated data, mitigating most of the weaknesses associated with current verification methods. This method is also easy to implement and will greatly enhance the quality of data migration operations.

Bin Wei, MS, is a senior researcher at the Pacific Islands Fisheries Science Center, a National Oceanic and Atmospheric Administration (NOAA) research branch within the Joint Institute for Marine and Atmospheric Research, a NOAA Cooperative Institute at the University of Hawaii at Manoa.* His areas of expertise include designing systems for remote data collection and data transformation.

Tennyson X. Chen, MS, is a senior research analyst and software system architect in RTI International’s Research Computing Division. His main focus is the National Survey of Drug Use and Health (NSDUH) project, for which he is a key system designer and database manager.