According to the 2018 Kaggle Machine Learning & Data Science Survey, a scientist conducting a big data study typically spends 50%–65% of their time finding, gathering, harmonizing, and cleaning data—all before any analysis can be done. High-quality data determine the usability, robustness, and scalability of analyses and artificial intelligent (AI) models; however, data integration and harmonization are often operational, time consuming, and expensive.
View the Slides
RTI International’s MetaMatchMaker (M3) is a suite of cutting-edge AI approaches and tools; this suite saves time and reduces costs related to finding and integrating genomic, molecular, and clinical data for analysis. During the third RTI Tech Talk series webinar, attendees learned how to improve data discovery and data harmonization processes by using the M3 suite, which includes
- The M3 natural language processing neural network (M3:NN) that was trained using data from ~30,000 unique, manually mapped study concepts. M3:NN has been used to create two pieces of software to support data discovery and linking.
- A free data discovery tool—M3:Find—which allows users to find and share publicly available data across numerous databases.
- M3:Link, a data linkage tool, allows users to link study terms between multiple studies, enabling faster data integration across cohorts; this solution is also useful for building common data elements.
The webinar also highlighted examples of how the M3 suite has reduced the time and budget for harmonizing complex datasets by more than 50%.