Evaluating patient-reported outcome measurement comparability between paper and alternate versions, using the lung function questionnaire as an example
Dalal, A., Nelson, L., Coles, T., McLeod, L., Lewis, S., & DeMuro-Mercon, C. (2011). Evaluating patient-reported outcome measurement comparability between paper and alternate versions, using the lung function questionnaire as an example. Value in Health, 14(5), 712-720. DOI: 10.1016/j.jval.2010.12.007, 10.1016/j.jval.2010.12.007
Objectives - The goal of this study was to provide recommended steps to assess measurement comparability using a crossover study design and to demonstrate these steps using a short patient-reported outcome (PRO) instrument as an example. Methods - The example PRO instrument was administered via paper, Web, interactive voice response system, and interview; a randomized crossover design was used to gather data across the multiple administration types. Participants completed the PRO instrument, demographic and health questions, and a short preference questionnaire. Evaluation included comparisons of the item-level responses and agreement, comparison of mean scale scores, score classifications, and questions designed to collect usability and administration preference. Here the authors provide a four-step evaluation guide to evaluate measurement comparability and illustrate these steps using a case-finding tool. Results - In the example, item-level kappa statistics between the paper and the alternate versions ranged from good to excellent, intraclass correlation coefficient for mean scores were above 0.70, and the rate of disagreement ranged from 2% to 14%. In addition, although participants had an administration preference, they reported few difficulties with the versions they were assigned. Conclusions - The steps described in this article provide a guide for evaluating whether to combine scores across administration versions to simplify analyses and interpretation under a crossover design. The guide recommends the investigation of item-level responses, summary scores, and participant usability/preference when comparing versions, with each step providing unique information to support comprehensive evaluation and informed decisions regarding whether to combine data.