Big data, big results: Knowledge discovery in output from large-scale analytics
McCormick, T. H., Ferrell, R., Karr, A., & Ryan, P. B. (2014). Big data, big results: Knowledge discovery in output from large-scale analytics: Special Issue. Statistical Analysis and Data Mining, 7(5), 404-412. DOI: 10.1002/sam.11237
Observational healthcare data, such as electronic health records and administrative claims databases, provide longitudinal clinical information at the individual level. These data cover tens of millions of patients and present unprecedented opportunities to address such issues as post-market safety of medical products. Analyzing patient-level databases yields population-level inferences, or 'results', such as the strength of association between medical product exposure and subsequent outcomes, often with thousands of drugs and outcomes. In this article, by contrast, we study 'big results', which are the product of applying thousands of alternative analysis strategies to five large patient databases. These results were produced by the Observational Medical Outcomes Partnership. All together, there are more than 6 million results, comprising risk assessments for 399 medical product-outcome pairs analyzed across five observational databases using seven statistical methods, each of which has between a few dozen and a few hundred variants representing parameters or 'tuning variables'. We focus on the value of knowledge discovery methods and the challenges in extracting clinically relevant knowledge from big results. We believe our analyses are both scientifically and methodologically valuable as they reveal information about how methods/algorithms perform under various circumstances, as well as provide a basis for comparison of these methods.