• Presentation

Simulation Comparison of Variable Selection and Classification Methods

Citation

Liu, J., Wu, S., Raymer, J., Hu, Y., & Michael, L. C. (2005, August). Simulation Comparison of Variable Selection and Classification Methods. Presented at Joint Statistical Meetings, Minneapolis, MN.

Abstract

Finding the best method to partition subjects into homogeneous classes is a problem that is frequently encountered in the areas of chemo metrics and bioinformatics.  Several methods are compared for this purpose.  They are Stepwise Linear Discriminant Analysis (SLDA), Canonical Discriminate Analysis (CDA), Stepwise Multinomial Logistic Regression (SMLR), Support Vector Machines (SVM), Generalized Discriminant Analysis (GDA), and Kernel partial Least Squares (KPLS).  Datasets are simulated by varying the number of classes, distance between the classes, number and distribution of the potential classifiers, level of correlation among the classes, number and distribution of the potential classifiers, level of correlation among the potential classifiers, and inherent nonlinearity.  The results suggest that when the variables are normally distributed and linerarly separable, SLDA and SVM are the best performers; when the data is nonlinear and variables are not normally distributed, SVM is the best performer; when the number of the variables are large compared to the sample size, linear methods can achieve a satisfying rate of correct classification in most cases; and when the curvature of the separating planes increased linear methods are less satisfactory.