Background: The linkage of datasets is necessary to effectively utilize available heterogeneous health data. Prior to linking multiple discrete data sources to create novel linked datasets, researchers must assess the feasibility of both scientific aspects (data quality and linkage methods) and operational aspects (access, data use and transfer, governance, and cost).
Objectives: To provide guidance on key aspects of data linkage and methods necessary to plan useful and sustainable linkages that advance pharmacoepidemiology and patient safety.
Methods: An ISPE‐supported working group included members from academic, industry and contract research, government, and regulatory sectors. The scope of the activity was determined by querying the group using an online survey to determine the priority content areas; planning and feasibility of data linkage was one of the content areas chosen. Within this topic, scientific and operational considerations were determined, reviewed, and assembled by consensus into key recommendations.
Results: Guidance for feasibility assessment was categorized into key areas: (1) research objectives and justification, (2) data harmonization and quality, (3) the linkage process, (4) data ownership and governance, and (5) overall value added by linkage. Within these key areas, 10 questions were developed as criteria to consider prior to initiation of a linkage to determine if the research objectives are appropriate for the proposed linkage, assess source data completeness and population coverage, and ensure data governance standards and protections are well defined. The planning of a linkage project requires careful evaluation to weigh the resources involved in initiation of a new linkage and the overall potential benefits of answering innovative research questions, expanding the scope of a project, improving classification of existing variables, and increasing the internal validity of population estimates.
Conclusions: These recommendations can help researchers conscientiously assess and design sustainable linked data resources that can be leveraged to fill gaps in data and generate novel, actionable evidence in pharmacoepidemiology.