Assessing test-retest reliability of patient-reported outcome measures using intraclass correlation coefficients
recommendations for selecting and documenting the analytical formula
Qin, S., Nelson, L., McLeod, L., Eremenco, S., & Coons, S. J. (2018). Assessing test-retest reliability of patient-reported outcome measures using intraclass correlation coefficients: recommendations for selecting and documenting the analytical formula. Quality of Life Research. Advance online publication. https://doi.org/10.1007/s11136-018-2076-0
PURPOSE: The US Food and Drug Administration (FDA) 2009 guidance for industry on patient-reported outcome (PRO) measures describes how the Agency evaluates the psychometric properties of measures intended to support medical product labeling claims. An important psychometric property is test-retest reliability. The guidance lists intraclass correlation coefficients (ICCs) and the assessment time period as key considerations for test-retest reliability evaluations. However, the guidance does not provide recommendations regarding ICC computation, nor is there consensus within the measurement literature regarding the most appropriate ICC formula for test-retest reliability assessment. This absence of consensus emerged as an issue within Critical Path Institute's PRO Consortium. The purpose of this project was to generate thoughtful and informed recommendations regarding the most appropriate ICC formula for assessing a PRO measure's test-retest reliability.
METHODS: Literature was reviewed and a preferred ICC formula was proposed. Feedback on the chosen formula was solicited from psychometricians, biostatisticians, regulators, and other scientists who have collaborated on PRO Consortium initiatives.
RESULTS AND CONCLUSIONS: Feedback was carefully considered and, after further deliberation, the proposed ICC formula was confirmed. In conclusion, to assess test-retest reliability for PRO measures, the two-way mixed-effect analysis of variance model with interaction for the absolute agreement between single scores is recommended.