Application of Generalized Additive Models to the Evaluation of Continuous Markers for Classification Purposes

Mónica    López-Ratón; Mar Rodríguez-Girondo; María   Xosé    Rodríguez-Álvarez; Carmen Cadarso-Suárez; Francisco Gude

doi:10.6000/1929-6029.2015.04.03.8

Authors

Mónica López-Ratón Biostatistics Unit, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain
Mar Rodríguez-Girondo Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational Research, University of Vigo, Vigo, Spain
María Xosé Rodríguez-Álvarez Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational Research, University of Vigo, Vigo, Spain
Carmen Cadarso-Suárez Biostatistics Unit, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain
Francisco Gude Clinical Epidemiology Unit, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), Galician Health Service (SERGAS), Santiago de Compostela, Spain

DOI:

https://doi.org/10.6000/1929-6029.2015.04.03.8

Keywords:

Discriminatory capability, ROC, AUC, optimal cutpoint, biomarker, plasma glucose

Abstract

Background: Receiver operating characteristic (ROC) curve and derived measures as the Area Under the Curve (AUC) are often used for evaluating the discriminatory capability of a continuous biomarker in distinguishing between alternative states of health. However, if the marker shows an irregular distribution, with a dominance of diseased subjects in noncontiguous regions, classification using a single cutpoint is not appropriate, and it would lead to erroneous conclusions. This study sought to describe a procedure for improving the discriminatory capacity of a continuous biomarker, by using generalized additive models (GAMs) for binary data.

Methods: A new classification rule is obtained by using logistic GAM regression models to transform the original biomarker, with the predicted probabilities being the new transformed continuous biomarker. We propose using this transformed biomarker to establish optimal cut-offs or intervals on which to base the classification. This methodology is applied to different controlled scenarios, and to real data from a prospective study of patients undergoing surgery at a University Teaching Hospital, for examining plasma glucose as postoperative infection biomarker.

Results: Both, theoretical scenarios and real data results show that when the risk marker-disease relationship is not monotone, using the new transformed biomarker entails an improvement in discriminatory capacity. Moreover, in these situations, an optimal interval seems more reasonable than a single cutpoint to define lower and higher disease-risk categories.

Conclusions: Using statistical tools which allow for greater flexibility (e.g., GAMs) can optimize the classificatory capacity of a potential marker using ROC analysis. So, it is important to question linearity in marker-outcome relationships, in order to avoid erroneous conclusions.

Author Biographies

Mónica López-Ratón, Biostatistics Unit, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain

Biostatistics Unit, Department of Statistics and Operational Research

Mar Rodríguez-Girondo, Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational Research, University of Vigo, Vigo, Spain

Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational
Research

María Xosé Rodríguez-Álvarez, Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational Research, University of Vigo, Vigo, Spain

Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational
Research

Carmen Cadarso-Suárez, Biostatistics Unit, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain

Biostatistics Unit, Department of Statistics and Operational Research

Francisco Gude, Clinical Epidemiology Unit, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), Galician Health Service (SERGAS), Santiago de Compostela, Spain

Clinical Epidemiology Unit, Galician Health Service (SERGAS)

References

Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978; 8: 283-98. http://dx.doi.org/10.1016/S0001-2998(78)80014-2 DOI: https://doi.org/10.1016/S0001-2998(78)80014-2

Swets JA, Pickett RM. Evaluation of diagnostic systems: Methods from signal detection theory. New York: Academic Press 1982.

Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29-36. http://dx.doi.org/10.1148/radiology.143.1.7063747 DOI: https://doi.org/10.1148/radiology.143.1.7063747

McIntosh MW, Pepe MS. Combining several screening tests: optimality of the risk score. Biometrics 2002; 58: 657-64. http://dx.doi.org/10.1111/j.0006-341X.2002.00657.x DOI: https://doi.org/10.1111/j.0006-341X.2002.00657.x

Lustres-Pérez V, Rodríguez-Álvarez MX, Pazos-Pata M, Cadarso-Suárez C, Fernández-Pulpeiro E. The application of Receiver Operating Characteristic (ROC) methodology in biological studies of marine resources: sex determination of Paracentrotus lividus (Lamarck, 1816). SORT 2010; 34: 239-48.

Hastie TJ, Tibshirani RJ. Generalized additive models. London: Chapman and Hall 1990.

Mazumdar M, Glassman JR. Categorizing a prognostic variable: review of methods, code for easy implementation and applications to decision-making about cancer treatments. Stat Med 2000; 19: 113-32. http://dx.doi.org/10.1002/(SICI)1097-0258(20000115)19:1<113::AID-SIM245>3.0.CO;2-O DOI: https://doi.org/10.1002/(SICI)1097-0258(20000115)19:1<113::AID-SIM245>3.0.CO;2-O

Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst 1994; 86: 829-35. http://dx.doi.org/10.1093/jnci/86.11.829 DOI: https://doi.org/10.1093/jnci/86.11.829

Lausen B, Schumacher M. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comput Stat Data Anal 1996; 21: 307-26. http://dx.doi.org/10.1016/0167-9473(95)00016-X DOI: https://doi.org/10.1016/0167-9473(95)00016-X

Klotsche J, Ferger D, Pieper L, Rehm J, Wittchen HU. A novel nonparametric approach for estimating cut-offs in continuous risk indicators with application to diabetes epidemiology. BMC Med Res Methodol 2009; 9: 63. http://dx.doi.org/10.1186/1471-2288-9-63 DOI: https://doi.org/10.1186/1471-2288-9-63

Figueiras A, Cadarso-Suárez C. Application of nonparametric models for calculating odds ratios and their confidence intervals for continuous exposures. Am J Epidemiol 2001; 154: 264-75. http://dx.doi.org/10.1093/aje/154.3.264 DOI: https://doi.org/10.1093/aje/154.3.264

Altemeier W. Surgical infections: incisional wounds. In: Bennet JV, Brachman P, editors. Hospital infections. Boston: Little, Brown and Company 1979.

Neyman J, Pearson ES. On the problem of the most efficient tests of statistical hypothesis. Philos Trans R Soc Lond A 1933; 231: 289-337. http://dx.doi.org/10.1098/rsta.1933.0009 DOI: https://doi.org/10.1098/rsta.1933.0009

Eilers P, Marx B. Flexible smoothing with B-splines and penalties. Stat Sci 1996; 11: 89-121. http://dx.doi.org/10.1214/ss/1038425655 DOI: https://doi.org/10.1214/ss/1038425655

Wood SN. Thin plate regression splines. J R Stat Soc Series B Stat Methodol 2003; 65: 95-114. http://dx.doi.org/10.1111/1467-9868.00374 DOI: https://doi.org/10.1111/1467-9868.00374

Lang S, Brezger A. Bayesian P-splines. J Comput Graph Stat 2004; 13: 183-212. http://dx.doi.org/10.1198/1061860043010 DOI: https://doi.org/10.1198/1061860043010

McCullagh P, Nelder J. Generalized linear models. 2nd ed. London: Chapman and Hall 1989. http://dx.doi.org/10.1007/978-1-4899-3242-6 DOI: https://doi.org/10.1007/978-1-4899-3242-6

Wand MP, Jones MC. Kernel smoothing. London: Chapman and Hall 1995. http://dx.doi.org/10.1007/978-1-4899-4493-1 DOI: https://doi.org/10.1007/978-1-4899-4493-1

Wood SN. Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 2004; 99: 673-86. http://dx.doi.org/10.1198/016214504000000980 DOI: https://doi.org/10.1198/016214504000000980

Zhao LP, Kristal AR, White E. Estimating relative risk functions in case-control studies using a nonparametric logistic regression. Am J Epidemiol 2006; 144: 598-609. http://dx.doi.org/10.1093/oxfordjournals.aje.a008970 DOI: https://doi.org/10.1093/oxfordjournals.aje.a008970

R Development Core Team. R: A language and environment for statistical computing, version R.3.1.3. R Foundation for Statistical Computing, Vienna, Austria 2015. URL http://www.R-project.org/

Wood SN. Generalized additive models, an introduction with R. Boca Raton, Florida: Chapman and Hall/CRC 2006.

Du P, Tang L. Transformation-invariant and nonparametric monotone smooth estimation of ROC curves. Stat Med 2009; 28: 349-59. http://dx.doi.org/10.1002/sim.3465 DOI: https://doi.org/10.1002/sim.3465

Swets JA, Tanner WPJ, Birdsall TG. Decision processes in perception. Phychol Rev 1961; 68: 301-40. http://dx.doi.org/10.1037/h0040547 DOI: https://doi.org/10.1037/h0040547

Egan JP. Signal detection theory and ROC Analysis. New York: Academic Press 1975.

Sáez M, Cadarso-Suárez C, Figueiras A. np.OR: an S-Plus function for pointwise nonparametric estimation of odds-ratios of continuous predictors. Comput Methods Programs Biomed 2003; 71: 175-79. http://dx.doi.org/10.1016/S0169-2607(02)00076-7 DOI: https://doi.org/10.1016/S0169-2607(02)00076-7

Efron B. Bootstrap methods: Another look at the jackknife. Ann Stat 1979; 7: 1-26. http://dx.doi.org/10.1214/aos/1176344552 DOI: https://doi.org/10.1214/aos/1176344552

Cid-Álvarez B, Gude F, Cadarso-Suárez C, et al. Admission and fasting plasma glucose for estimating risk of death of diabetic and nondiabetic patients with acute coronary syndrome: nonlinearity of hazard ratios and time-dependent comparison. Am Heart J 2009; 58: 989-97. http://dx.doi.org/10.1016/j.ahj.2009.10.004 DOI: https://doi.org/10.1016/j.ahj.2009.10.004

Bertone-Johnson ER, Tworoger SS, Hankinson SE. Recreational physical activity and steroid hormone levels in postmenopausal women. Am J Epidemiol 2009; 170: 1095-104. http://dx.doi.org/10.1093/aje/kwp254 DOI: https://doi.org/10.1093/aje/kwp254

Riddle DL, Stratford PW. Interpreting validity indexes for diagnostic tests: an illustration using the Berg balance test. Phys Ther 1999; 79: 939-50. DOI: https://doi.org/10.1093/ptj/79.10.939

Greiner M, Pfeiffer D, Smith RD. Principals and practical application of the receiver operating characteristic analysis for diagnostic tests. Prev Vet Med 2002; 45: 23-41. http://dx.doi.org/10.1016/S0167-5877(00)00115-X DOI: https://doi.org/10.1016/S0167-5877(00)00115-X

Van den Berghe G, Wouters P, Weekers F, et al. Intensive insulin therapy in the critically ill patients. N Engl J Med 2001; 345: 1359-67. http://dx.doi.org/10.1056/NEJMoa011300 DOI: https://doi.org/10.1056/NEJMoa011300

Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000; 56: 337-44. http://dx.doi.org/10.1111/j.0006-341X.2000.00337.x DOI: https://doi.org/10.1111/j.0006-341X.2000.00337.x

Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics 2005; 61: 92-105. http://dx.doi.org/10.1111/j.0006-341X.2005.030814.x DOI: https://doi.org/10.1111/j.0006-341X.2005.030814.x

Cox DR. Regression models and life-tables (with discussion). J R Stat Soc Series B Stat Methodol 1972; 34: 187-220. DOI: https://doi.org/10.1111/j.2517-6161.1972.tb00899.x

Cadarso-Suárez C, Meira-Machado L, Kneib T, Gude F. Flexible hazard ratio curves for continuous predictors in multi-state models: an application to breast cancer data. Stat Modelling 2010; 10: 291-314. http://dx.doi.org/10.1177/1471082X0801000303 DOI: https://doi.org/10.1177/1471082X0801000303

Lado MJ, Cadarso-Suárez C, Roca-Pardiñas J, Tahoces PG: Using generalized additive models for construction of nonlinear classifiers in computer-aided diagnosis systems. IEEE Trans Inf Technol Biomed 2006; 10: 246-53. http://dx.doi.org/10.1109/TITB.2005.859892 DOI: https://doi.org/10.1109/TITB.2005.859892

Application of Generalized Additive Models to the Evaluation of Continuous Markers for Classification Purposes

Authors

DOI:

Keywords:

Abstract

Author Biographies

Mónica López-Ratón, Biostatistics Unit, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain

Mar Rodríguez-Girondo, Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational Research, University of Vigo, Vigo, Spain

María Xosé Rodríguez-Álvarez, Statistical Inference, Decision and Operations Research Group, Department of Statistics and Operational Research, University of Vigo, Vigo, Spain

Carmen Cadarso-Suárez, Biostatistics Unit, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain

Francisco Gude, Clinical Epidemiology Unit, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), Galician Health Service (SERGAS), Santiago de Compostela, Spain

References

Downloads

Published

How to Cite

Issue

Section

License

Policy for Journals/Articles with Open Access

Policy for Journals / Manuscript with Paid Access

Information