Propensity score matching versus propensity score fine stratification and coarsened exact matching in claims data

John E Ripollone; Krista F. Huybrechts; KJ Rothman; Ryan E Ferguson; Jessica M Franklin

Propensity score matching versus propensity score fine stratification and coarsened exact matching in claims data

Ripollone, J. E., Huybrechts, K. F., Rothman, KJ., Ferguson, R. E., & Franklin, J. M. (2018). Propensity score matching versus propensity score fine stratification and coarsened exact matching in claims data. Pharmacoepidemiology and Drug Safety, 27(S2), Article 47. https://doi.org/10.1002/pds.4629

Copy citation

Abstract

Background: Criticisms of propensity score matching (PSM) have accrued in the literature. For example, it has been suggested that continuous deletion of matched sets in decreasing order of propensity score distance may lead to increased bias in the effect estimate. We present the results of a comparison of PSM with propensity score fine stratification (FS) and with coarsened exact matching (CEM). We chose these alternative techniques because of the suggestion of their superiority over PSM in the literature.

Objectives: Compare PSM with FS and CEM with respect to validity and precision of effect estimates using claims data.

Methods: We used data from the Pharmaceutical Assistance Contract for the Elderly database (PACE, n = 49 653, 19 pre‐specified confounders) to assess the association between NSAIDs vs COX‐2 inhibitors and gastrointestinal complications and data from the Medicaid Analytic eXtract database (MAX, n = 886 996, 20 pre‐specified confounders) to assess the association between statins vs no statins and congenital malformations. PACE was analyzed with 50 and 100 additional empirical confounders selected from a high‐dimensional propensity score algorithm. Three techniques were applied to each dataset: (1) 1:1 PSM using a nearest neighbor matching algorithm, (2) FS using 10, 50, and 100 strata ranked by the propensity score distribution of the exposed after deleting observations from non‐overlapping propensity score regions, and (3) CEM using an auto‐coarsening technique.

Our strategy generated 20 analytic datasets. For each analytic dataset, we compared the resulting relative risks (RR) and standard errors (SE) from weighted log binomial models, as well as the numbers of units remaining.

Results: For each PACE dataset, FS resulted in a larger analytic dataset (>90% of the original dataset) and a lower SE than PSM and CEM. The RRs from PSM and FS were similar (indicating a ~10% increase in risk with NSAIDs) and consistent with prior evidence from experimental studies, while CEM resulted in larger effect estimates in all cases (indicating up to a 150% increase in risk). The MAX analyses led to similar findings.

Conclusions: FS was optimal in our analyses due to the high retention of study size and low SEs. CEM appears sub‐optimal for claims data, likely due to the high volume of binary confounders. As next steps, we will explore different coarsening strategies for CEM, and we will generate plasmode‐simulated datasets, which allow for clearer comparisons based on known effect sizes.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Recent Publications

Article

Patient-reported outcome improvements following scalp hair regrowth among patients with Alopecia Areata: analysis of the ALLEGRO-2b/3 trial

December 2025

Article

Plain language summary of mortality rates of patients with Parkinson’s disease psychosis who were treated either with pimavanserin or with different second-generation (atypical) antipsychotics

December 2025

Article

Higher acceptability of the monthly dapivirine ring versus daily oral pre-exposure prophylaxis among adolescent girls and young women in sub-Saharan Africa in the REACH trial

December 2025

Article

The relationship between household economic shocks, depression, and elevated stress-responsive biomarkers among adolescent girls and young women in rural South Africa (HPTN 068)

December 2025

Article

Biological parenthood rates among men with sickle cell disease

December 2025

Article

Patterns of felt stigma among rural-dwelling people who use drugs: A latent class analysis

December 2025

Article

One voice and vision: How the RISE network built a collective identity as the foundation for strategic dissemination

December 2025

Article

Estimating community-level prevalence of opioid use disorder: Extrapolating from Medicaid claims data and other publicly available data sources in Ohio, USA

December 2025

View All Publications