Exploring the Performance of Methods to Deal Multicollinearity: Simulation and Real Data in Radiation Epidemiology Area

Authors

  • Mickaël Dubocq Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Nadia Haddy Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Boris Schwartz Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Carole Rubino Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Florent Dayet Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Florent de Vathaire Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Ibrahima Diallo Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France
  • Rodrigue S. Allodji Radiation Epidemiology Group, INSERM U1018, Villejuif, F-94805, France

DOI:

https://doi.org/10.6000/1929-6029.2018.07.02.2

Keywords:

Lasso Regression, Multicollinearity, Organs volume modelling, Partial Least Squares Regression, Principal Components Regression, Ridge Regression.

Abstract

The issue of multicollinearity has long been acknowledged in statistical modelling; however, it is often untreated in the most of published papers. Indeed, the use of methods for multicollinearity correction is still scarce. One important reason is that despite many proposed methods, little is known about their strength or performance. We compare the statistical properties and performance of four main techniques to correct multicollinearity, i.e., Ridge Regression (R-R), Principal Components Regression (PC-R), Partial Least Squares Regression (PLS-R), and Lasso Regression (L-R), in both a simulation study and two real data examples used for modelling volumes of heart and Thyroid as a function of clinical and anthropometric parameters. We find that when the statistical approaches were used to address different levels of collinearity, we observed that R-R, PC-R and PLS-R appeared to have a somewhat similar behavior, with a slight advantage for the PLS-R. Indeed, in all implemented cases, the PLS-R always provided the smallest value of root mean square error (RMSE). When the degree of collinearity was moderate, low or very low, the L-R method had also somewhat similar performance to other methods. Furthermore, correction methods allowed us to provide stable and trustworthy parameter estimates for predictors in the modelling of heart and Thyroid volumes. Therefore, this work will contribute to highlighting performances of methods used only for situations ranging from low to very high multicollinearity.

References

Pitard A, Viel JF. Some methods to address collinearity among pollutants in epidemiological time series. Statistics in Medicine 1997; 16(5): 527-44. https://doi.org/10.1002/(SICI)1097-0258(19970315)16:5<527::AID-SIM429>3.0.CO;2-C DOI: https://doi.org/10.1002/(SICI)1097-0258(19970315)16:5<527::AID-SIM429>3.0.CO;2-C

Schroeder, Mary Ann. Diagnosing and dealing with multicollinearity. Western Journal of Nursing Research 1990; 12(2): 175-187. https://doi.org/10.1177/019394599001200204 DOI: https://doi.org/10.1177/019394599001200204

Gordon RA. Issues in multiple regression. American Journal of Sociology 1968; 73: 592-616. https://doi.org/10.1086/224533 DOI: https://doi.org/10.1086/224533

Dormann CF, Elith J, Bacher S, et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 2012; 35: 001-020.

Weisberg S. Applied Linear Regression, third edition. New-York: Wiley. 2005. www.stat.umn.edu/alr DOI: https://doi.org/10.1002/0471704091

Buonaccorsi JP. A modified estimating equation approach to correcting for measurement error in regression. Biometrika 1996; 83: 433-440. https://doi.org/10.1093/biomet/83.2.433 DOI: https://doi.org/10.1093/biomet/83.2.433

Hoerl E, Kennard RW. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics 1970; 12: 69-82. https://doi.org/10.1080/00401706.1970.10488635 DOI: https://doi.org/10.1080/00401706.1970.10488635

Guilkey DK, Murphy JL. Directed Ridge Regression Techniques in cases of Multicollinearity. Journal of American Statistical Association 1975; 70: 767-775. https://doi.org/10.1080/01621459.1975.10480301 DOI: https://doi.org/10.1080/01621459.1975.10480301

El-Dereny M, Rashwan NI. Solving Multicollinearity Problem Using Ridge Regression Models. International Journal of Contemporary Mathematical Sciences 2011; 6: 585-600.

Meijer RJ, Goeman JJ. Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical Journal 2013; 55: 141-55. https://doi.org/10.1002/bimj.201200088 DOI: https://doi.org/10.1002/bimj.201200088

SAS Institute Inc. SAS® 9.3 System Options: Reference, Second Edition. Cary, NC: SAS Institute Inc. 2011. https://support.sas.com/documentation/cdl/en/lesysoptsref/64892/PDF/default/lesysoptsref.pdf

Vigneau E, Bertrand D, Qannari EM. Application of latent root regression for calibration in near-infrared spectroscopy: Comparion with principal component regression and partial least squares. Chemometrics and Intelligent laboratory system 1996; 35: 231-238. https://doi.org/10.1016/S0169-7439(96)00051-2 DOI: https://doi.org/10.1016/S0169-7439(96)00051-2

Cassel C, Westlund AH, Hackl P. Robustness of partial least-squares method for estimating latent variable quality structures. Journal of Applied Statistics 1999; 26: 435-448. https://doi.org/10.1080/02664769922322 DOI: https://doi.org/10.1080/02664769922322

Chun H, Keleş S. Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection. Journal of the Royal Statistical Society B Statistical Methodology 2010; 72: 3-25. https://doi.org/10.1111/j.1467-9868.2009.00723.x DOI: https://doi.org/10.1111/j.1467-9868.2009.00723.x

Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 2007; 8: 32-44. https://doi.org/10.1093/bib/bbl016 DOI: https://doi.org/10.1093/bib/bbl016

Helland I. On the structure of Partial Least Squares. Communications in Statistics - Simulation and Computation 1988; 17: 581-607. https://doi.org/10.1080/03610918808812681 DOI: https://doi.org/10.1080/03610918808812681

Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society B Statistical Methodology 2011; 73(3): 267-288. https://doi.org/10.1111/j.1467-9868.2011.00771.x DOI: https://doi.org/10.1111/j.1467-9868.2011.00771.x

Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society B Statistical Methodology 2005; 67: 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x

Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics 2004; 32: 407-451. https://doi.org/10.1214/009053604000000067 DOI: https://doi.org/10.1214/009053604000000067

Grandvalet Y. Least absolute shrinkage is equivalent to quadratic penalization. In Niklasson L, Boden M, Ziemske T (eds) ICANN'98 Perspectives in Neural Computing. Springer-Verlag: Berlin 1998. https://doi.org/10.1007/978-1-4471-1599-1_27 DOI: https://doi.org/10.1007/978-1-4471-1599-1_27

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag. New York 2001. https://doi.org/10.1007/978-0-387-21606-5 DOI: https://doi.org/10.1007/978-0-387-21606-5

Badouna AN, Veres C, Haddy N, et al. Total heart volume as a function of clinical and anthropometric parameters in a population of external beam radiation therapy patients. Physics in Medicine & Biology 2012; 57: 473-484. https://doi.org/10.1088/0031-9155/57/2/473 DOI: https://doi.org/10.1088/0031-9155/57/2/473

International Commission on Radiological Protection (ICRP). Basic Anatomical and Physiological Data for Use in Radiological Protection: Reference Values ICRP Publication 89 (Pergamon: Oxford) 2002.

Graham TP Jr, Jarmakani JM, Canent RV Jr, et al. Left heart volume estimation in infancy and childhood. Reevaluation of methodology and normal values. Circulation 1971; 43: 895-904. https://doi.org/10.1161/01.CIR.43.6.895 DOI: https://doi.org/10.1161/01.CIR.43.6.895

Veres C, Garsi JP, Rubino C, et al. Thyroid volume measurement in external beam radiotherapy patients using CT imaging: correlation with clinical and anthropometric characteristics. Physics in Medicine & Biology 2010; 55: 507-519. https://doi.org/10.1088/0031-9155/55/21/N02 DOI: https://doi.org/10.1088/0031-9155/55/21/N02

Xu XG, Bednarz B, Paganetti H. A review of dosimetry studies on external-beam radiation treatment with respect to second cancer induction. Physics in Medicine & Biology 2008; 53: 193-241. https://doi.org/10.1088/0031-9155/53/13/R01 DOI: https://doi.org/10.1088/0031-9155/53/13/R01

Zaidi H, Xu XG. Computational anthropomorphic models of the human anatomy: the path to realistic Monte Carlo modelling in radiological sciences. Annual Review of Biomedical Engineering 2007; 9: 471-500. https://doi.org/10.1146/annurev.bioeng.9.060906.151934 DOI: https://doi.org/10.1146/annurev.bioeng.9.060906.151934

Scarboro SB, Stovall M, White A, et al. Effect of organ size and position on out-of-field dose distributions during radiation therapy. Physics in Medicine & Biology 2010; 55: 7025-7036. https://doi.org/10.1088/0031-9155/55/23/S05 DOI: https://doi.org/10.1088/0031-9155/55/23/S05

Barrère X, Valeix P, Preziosi P, Bensimon M, Pelletier B, Galan P, Hercberg S. Determinants of thyroid volume in healthy French adults participating in the SU.VI.MAX cohort. Clinical Endocrinology 2000; 52: 273-278. https://doi.org/10.1046/j.1365-2265.2000.00939.x DOI: https://doi.org/10.1046/j.1365-2265.2000.00939.x

Gruber MHJ. Regression Estimators: a Comparative Study. Academic Press: Boston 1990. DOI: https://doi.org/10.1016/B978-0-12-304752-6.50006-8

Frank IE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics 1993; 35: 109-148. https://doi.org/10.1080/00401706.1993.10485033 DOI: https://doi.org/10.1080/00401706.1993.10485033

Wentzell PD, Montoto V. Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures. Chemometrics and Intelligent Laboratory Systems 2003; 65: 257-279. https://doi.org/10.1016/S0169-7439(02)00138-7 DOI: https://doi.org/10.1016/S0169-7439(02)00138-7

Wu J, Devlin B, Ringquist S, Trucco M, Roeder K. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genetic Epidemiology 2010; 34: 275-85. https://doi.org/10.1002/gepi.20459 DOI: https://doi.org/10.1002/gepi.20459

Xu S. Estimating polygenic effects using markers of the entire genome. Genetics 2003; 163: 789-801. DOI: https://doi.org/10.1093/genetics/163.2.789

Curtis SM, Ghosh SK. A Bayesian Approach to Multicollinearity and the Simultaneous Selection and Clustering of Predictors in Linear Regression. Journal of Statistical Theory and Practice 2011; 5: 715-735. https://doi.org/10.1080/15598608.2011.10483741 DOI: https://doi.org/10.1080/15598608.2011.10483741

Willis CE, Perlack RD. Multicollinearity: effects, symptoms, and remedies. Northeastern Journal of Agricultural and Resource Economics 1978; 7: 55-61. https://doi.org/10.1017/S0163548400001989 DOI: https://doi.org/10.1017/S0163548400001989

Downloads

Published

2018-05-08

How to Cite

Dubocq, M., Haddy, N., Schwartz, B., Rubino, C., Dayet, F., Vathaire, F. de, Diallo, I., & Allodji, R. S. (2018). Exploring the Performance of Methods to Deal Multicollinearity: Simulation and Real Data in Radiation Epidemiology Area. International Journal of Statistics in Medical Research, 7(2), 33–44. https://doi.org/10.6000/1929-6029.2018.07.02.2

Issue

Section

General Articles