The piece was first published by the Medical Care Blog.
To help determine which health policy changes to the Medicare or Medicaid programs are desirable, the CMS Center for Medicare and Medicaid Innovation (CMMI) relies on formal evaluations, performed by outside contractors, of how smaller scale, typically voluntary, demonstrations and other initiatives impact outcomes of interest. However, determining causal impacts often relies on the key assumption of parallel trends. We can only isolate the treatment impact if the treatment and comparison groups have outcomes evolving in a parallel fashion over time. In this post we report how CMMI evaluations are assessing and addressing parallel or non-parallel trends.
Parallel Trends Matter for Drawing Causal Conclusions
The 2021 Nobel Prize in Economic Sciences recognized the important contribution of econometric methods, developed in the 1990's, that allow researchers to draw causal conclusions from natural experiments. Over the first ten years of CMMI’s operation, these quasi-experimental methods have been used repeatedly. CMMI models have only rarely utilized a randomized controlled trial design. Among quasi-experimental methods, the most frequently used analysis is difference-in-differences (DiD) analysis.
Applying DiD, however, presents a challenge. After identifying a comparison group, one must be able to argue that the comparison serves as a suitable counterfactual. Namely, one must assume that the change measured in the comparison group between the pre-intervention and post-intervention periods approximates what would have occurred in the intervention group, had the intervention not occurred. This is commonly referred to as the “parallel trends assumption”. That is, changes in the intervention and comparison groups would parallel each other except for the impact of the intervention itself.
There is a developing literature (see here and here, for example) on the best approaches to assessing whether the parallel trends assumption holds when applying DiD, and what to do if it does not. Since best practices continue to evolve, evaluators may not always follow the ideal approach. But the ways in which evaluators deal with, or do not deal with, non-parallel trends can affect the results of the evaluation. This is clearly important for evaluations with policy implications.
How Do CMMI Evaluations Assess Parallel Trends?
Assessing parallel trends relies on the observable pre-intervention trends in the baseline period. We can visually inspect pre-intervention trends, but statistical tests are also available. To the extent the pre-intervention trends for the intervention and comparison groups are non-parallel, this serves as evidence against the parallel trends assumption.
In some cases, data availability may drive the decision to test for parallel trends. For example:
- Sample size may not be large enough to detect non-parallel trends (not applicable for most CMMI evaluations).
- Sample size may be so large that tests are always statistically significant, though not practically significant.
- There may not be enough baseline periods of data to assess pre-intervention trends.
To understand how CMMI evaluations assess parallel trends, we reviewed the evaluation reports or Reports to Congress for 51 CMMI models posted between September 2012 and May 2021. We consulted several resources (here, here, and here) to develop the list of CMMI models with formal evaluations. We reviewed the reports to determine if researchers used DiD, whether and how the researchers tested for parallel trends, and the results of the test. The complete list of evaluations reviewed is available in PDF format.
Out of the 51 reports, 40 used DiD. Twenty-three of the evaluations using DiD tested for parallel trends, 19 using statistical tests and four using visual inspection only. As shown in Figure 1, testing for parallel trends was uncommon in CMMI evaluations prior to 2017. Between 2018 and 2019, four out of six reports described tests for parallel trends. Since 2020, 15 out of 17 reports included tests for parallel trends. Only one of the 15 reports used visual inspection only.