Partial Least Squares (PLS) Applied to Medical Bioinformatics
Land, W. H., Ford, W., Park, J-W., Mathur, R., Hotchkiss, N., Heine, J. J., Eschrich, S., Qiao, X., & Yeatman, T. (2011). Partial Least Squares (PLS) Applied to Medical Bioinformatics. In CH. Dagli (Ed.), COMPLEX ADAPTIVE SYSTEMS Elsevier Science B.V.. https://doi.org/10.1016/j.procs.2011.08.051
PLS initially creates uncorrelated latent variables which are linear combinations of the original input vectors Xi, where weights are used to determine linear combinations, which are proportional to the covariance. Secondly, a least squares regression is then performed on the subset of extracted latent variables that lead to a lower and biased variance on transformed data. This process, leads to a lower variance estimate of the regression coefficients when compared to the Ordinary Least Squares regression approach. Classical Principal Component Analysis (PCA), linear PLS and kernel ridge regression (KRR) techniques are well known shrinkage estimators designed to deal with multi-collinearity, which can be a serious problem. That is, multi-collinearity can dramatically influence the effectiveness of a regression model by changing the values and signs of estimated regression coefficients given different but similar data samples, thereby leading to a regression model which represents training data reasonably well, but generalizes poorly to validation and test data. We explain how to address these problems, which is followed by performing a PLS hypotheses driven preliminary research study and sensitivities analysis by not doing a combinatorial analysis as PLS will eliminate the unnecessary variables using a microarray colon cancer data set. Research studies as well as preliminary results are described in the results section. (C) 2010 Published by Elsevier B. V.