Project Data Sphere (PDS) is a research platform that provides the research community with broad access to both de-identified patient-level data from oncology clinical trials and related analytic tools. While these data are rich in measures that characterize the clinical trials under study, data providers are required to de-identify patient-level data by removing key demographic data. To address these analytic constraints, the data profiles in selected PDS patient-level cancer phase III clinical datasets have been augmented by linking the social, economic, and health-related characteristics of like cancer survivors from nationally representative health and health care-related survey data. Using statistical linkage and model-based techniques, patient-level records in selected PDS datasets have been linked to those of comparable cancer survivors, and are thereby augmented with survey content on social, economic, and health-related characteristics. These new analytically enhanced PDS data resources enable more targeted analyses designed to examine questions such as how disparities in cancer patients' access to health care and income impact patient outcomes in specific phase III clinical trials, and what variations in patient outcomes are associated with specific demographic, socioeconomic, and health-related factors. This study provides an overview of the methodologies used to connect patient-level clinical trial data with nationally representative health-related data on cancer survivors from the national Medical Expenditure Panel Survey (MEPS). MEPS was designed to provide national population-based health care use, expenditure, and source of payment estimates in addition to measures of health status, demographic characteristics, employment, health insurance coverage, and access to health care. Study findings include probabilistic assessments of the representation of the patients in the respective clinical trials relative to the characteristics of cancer survivors in the general population. The study also demonstrates how the augmented datasets serve to enable researchers to assess the impact of socioeconomic factors added through data integration on cancer survival and related outcomes of interest.
Data integration innovations to enhance analytic utility of clinical trial content to inform health disparities research