• Presentation

Classification Errors in Regression Models: A Bayesian Semi-Parametric Approach to Inference


van Hasselt, M. N. (2011, October). Classification Errors in Regression Models: A Bayesian Semi-Parametric Approach to Inference. Presented at APHA 2011, Washington, DC.


We consider the problem of classification errors in categorical data. For example, an intent-to-treat indicator may be a poor proxy for actual treatment received, due to noncompliance or sample attrition. A substance use indicator in self-reported survey data may be inaccurate due to poor recall or an unwillingness to tell the truth.

We propose a semi-parametric Bayesian model that accounts for classification errors in a regression context. This has two important advantages compared to existing approaches. First, parametric models (Bayesian and non-Bayesian) are often based on strong assumptions. Such assumptions sometimes lack a substantive foundation and, if incorrectly imposed, can severely bias statistical inference. Our modeling approach allows a researcher to relax these assumptions, both through the prior distribution and the likelihood. For example, rather than fixing the classification error probability at some value, we can specify a distribution over reasonable values.

Second, in the absence of strong assumptions, certain model parameters are typically no longer identified. It is often possible, however, to derive upper and lower parameter bounds that can be estimated from the data. The advantage of a Bayesian relative to a non-Bayesian model is that we can also make statements about the likely values of the parameters within the bounds. This has considerable practical value, since in some applications the bounds are far apart.

We apply our methods in an empirical context to data from the National Survey on Drug Use and Health, and re-evaluate the relation between socioeconomic characteristics, substance use behavior and treatment outcomes.