State of the psychometric methods: Patient-reported outcome measure development and refinement using item response theory

Angela M Stover; Lori D McLeod; Michelle M Langer; Wen-Hung Chen; Bryce B Reeve

State of the psychometric methods

Patient-reported outcome measure development and refinement using item response theory

Stover, A. M., McLeod, L. D., Langer, M. M., Chen, W.-H., & Reeve, B. B. (2019). State of the psychometric methods: Patient-reported outcome measure development and refinement using item response theory. Journal of Patient-Reported Outcomes, 3(1), 50. Article 50. https://doi.org/10.1186/s41687-019-0130-5

Copy citation

Abstract

Background: This paper is part of a series comparing different psychometric approaches to evaluate patient-reported outcome (PRO) measures using the same items and dataset. We provide an overview and example application to demonstrate 1) using item response theory (IRT) to identify poor and well performing items; 2) testing if items perform differently based on demographic characteristics (differential item functioning, DIF); and 3) balancing IRT and content validity considerations to select items for short forms.Methods: Model fit, local dependence, and DIF were examined for 51 items initially considered for the Patient-Reported Outcomes Measurement Information System (R) (PROMIS (R)) Depression item bank. Samejima's graded response model was used to examine how well each item measured severity levels of depression and how well it distinguished between individuals with high and low levels of depression. Two short forms were constructed based on psychometric properties and consensus discussions with instrument developers, including psychometricians and content experts. Calibrations presented here are for didactic purposes and are not intended to replace official PROMIS parameters or to be used for research.Results: Of the 51 depression items, 14 exhibited local dependence, 3 exhibited DIF for gender, and 9 exhibited misfit, and these items were removed from consideration for short forms. Short form 1 prioritized content, and thus items were chosen to meet DSM-V criteria rather than being discarded for lower discrimination parameters. Short form 2 prioritized well performing items, and thus fewer DSM-V criteria were satisfied. Short forms 1-2 performed similarly for model fit statistics, but short form 2 provided greater item precision.Conclusions: IRT is a family of flexible models providing item- and scale-level information, making it a powerful tool for scale construction and refinement. Strengths of IRT models include placing respondents and items on the same metric, testing DIF across demographic or clinical subgroups, and facilitating creation of targeted short forms. Limitations include large sample sizes to obtain stable item parameters, and necessary familiarity with measurement methods to interpret results. Combining psychometric data with stakeholder input (including people with lived experiences of the health condition and clinicians) is highly recommended for scale development and evaluation.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Lori D. McLeod

Lori D. McLeod

Recent Publications

Article

Patient-reported outcome improvements following scalp hair regrowth among patients with Alopecia Areata: analysis of the ALLEGRO-2b/3 trial

December 2025

Article

Plain language summary of mortality rates of patients with Parkinson’s disease psychosis who were treated either with pimavanserin or with different second-generation (atypical) antipsychotics

December 2025

Article

Higher acceptability of the monthly dapivirine ring versus daily oral pre-exposure prophylaxis among adolescent girls and young women in sub-Saharan Africa in the REACH trial

December 2025

Article

The relationship between household economic shocks, depression, and elevated stress-responsive biomarkers among adolescent girls and young women in rural South Africa (HPTN 068)

December 2025

Article

Advancing observer-reported outcome measurement: Development of the mood-as for observing distress in Angelman syndrome

December 2025

Article

Biological parenthood rates among men with sickle cell disease

December 2025

Article

Temporal changes in sglt2 inhibitor and glp-1 receptor agonist use in patients with chronic kidney disease and type 2 diabetes, 2012–2023: A US cohort study

December 2025

Article

Patterns of felt stigma among rural-dwelling people who use drugs: A latent class analysis

December 2025

View All Publications