Addressing the problem of switched class labels in latent variable mixture model simulation studies
The discrimination between alternative models and the detection of latent classes in the context of latent variable mixture modeling depends on sample size, class separation, and other aspects that are related to power. Prior to a mixture analysis it is useful to investigate model performance in a simulation study that reflects the research settings. Multiple data sets are generated under 1 or more models, and alternative models are fitted to the data. The aggregation of results over multiple data sets is complicated by the fact that mixture models are only identified up to a permutation of the class labels. Estimated class labels are arbitrary, with the effect that the estimated parameters for Class 1 could be incorrectly labeled as Class 2, Class 3, and so forth, relative to their data generating labels. In a simulation study, the detection of switched labels needs to be automated. Switched class labels are not necessarily simple to detect. This article describes different possible scenarios of switched class labels, and develops an algorithm implemented in R that (a) detects switched labels, and (b) provides information that can be used to either correct class labels or to discard a particular data set from a simulation if class labels are ambivalent. The algorithm is useful in Monte Carlo simulations involving latent variable mixture models.