Assessment of the Performance of Imputation Techniques in Observational Studies with Two Measurements

Authors

  • Urko Aguirre Research Unit, REDISSEC: Health Services Research on Chronic Patients Network, Hospital Galdakao-Usansolo, Galdakao, Bizkaia, Spain
  • Inmaculada Arostegui Department of Applied Mathematics, Statistics and Operational Research; Red de Investigación en Servicios Sanitarios y Enfermedades Crónicas. Faculty of Science and Technology, Leioa, Spain
  • Cristóbal Esteban Pneumology Department, REDISSEC: Health Services Research on Chronic Patients Network, Hospital Galdakao-Usansolo, Galdakao, Bizkaia, Spain
  • Jose María Quintana Research Unit, REDISSEC: Health Services, Research on Chronic Patients Network, Hospital Galdakao-Usansolo, Galdakao, Bizkaia, Spain

DOI:

https://doi.org/10.6000/1929-6029.2015.04.03.1

Keywords:

, HRQoL, Imputation, Missing data, Pre-post design

Abstract

In observational studies with two measurements when the measured outcome pertains to a health related quality of life (HRQoL) variable, one motivation of the research may be to determine the potential predictors of the mean change of the outcome of interest. It is very common in such studies for data to be missing, which can bias the results. Different imputation techniques have been proposed to cope with missing data in outcome variables. We compared five analysis approaches (Complete Case, Available Case, K- Nearest Neighbour, Propensity Score, and a Markov Chain Monte Carlo algorithm) to assess their performance when handling missing data at different missingness rates and mechanisms (MCAR, MAR and MNAR). These strategies were applied to a pre-post study of patients with Chronic Obstructive Pulmonary Disease. We analyzed the relationship of the changes in subjects HRQoL over one year with clinical and socio-demographic characteristics. A simulation study was also performed to illustrate the performance of the imputation methods. Relative and standardized bias was assessed on each scenario. For all missingness mechanisms, not imputing and using MCMC method, both combined with mixed-model analysis, showed lowest standardized bias. Conversely, Propensity Score showed worst bias values. When missingness pattern is MCAR or MAR and rate small, we recommend using mixed models. Nevertheless, when missingness percentage is high, in order to gain sample size and statistical power, MCMC is preferred, although there are no bias differences compared with the mixed models without imputation. For a MNAR scenario, a further sensitivity analysis should be made.

Author Biographies

Urko Aguirre, Research Unit, REDISSEC: Health Services Research on Chronic Patients Network, Hospital Galdakao-Usansolo, Galdakao, Bizkaia, Spain

Research Unit, REDISSEC

Inmaculada Arostegui, Department of Applied Mathematics, Statistics and Operational Research; Red de Investigación en Servicios Sanitarios y Enfermedades Crónicas. Faculty of Science and Technology, Leioa, Spain

Department of Applied Mathematics, Statistics and Operational Research; Red de Investigación en Servicios Sanitarios y Enfermedades Crónicas. Faculty of Science and Technology, Leioa, Spain

Cristóbal Esteban, Pneumology Department, REDISSEC: Health Services Research on Chronic Patients Network, Hospital Galdakao-Usansolo, Galdakao, Bizkaia, Spain

Pneumology Department, REDISSEC: Health Services Research on Chronic Patients Network

Jose María Quintana, Research Unit, REDISSEC: Health Services, Research on Chronic Patients Network, Hospital Galdakao-Usansolo, Galdakao, Bizkaia, Spain

Research Unit, REDISSEC: Health Services, Research on Chronic Patients Network

References

Altman DG. Missing data. BMJ 2007; 334(7590): 424. http://dx.doi.org/10.1136/bmj.38977.682025.2C DOI: https://doi.org/10.1136/bmj.38977.682025.2C

Barnard J. Applications of multiple imputation in medical studies: from AIDS to NHANES. Stats Methods Med Res 1999; 8(1): 17-36. http://dx.doi.org/10.1191/096228099666230705 DOI: https://doi.org/10.1177/096228029900800103

Little RJA. Statistical analysis with missing data. Wiley; 2002. http://dx.doi.org/10.1002/9781119013563 DOI: https://doi.org/10.1002/9781119013563

Laird NM. Missing data in longitudinal studies. Stat Med 1988; 7(1-2): 305-15. http://dx.doi.org/10.1002/sim.4780070131 DOI: https://doi.org/10.1002/sim.4780070131

Robins J, Rotnitzky A. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994; 89: 846-66. http://dx.doi.org/10.1080/01621459.1994.10476818 DOI: https://doi.org/10.1080/01621459.1994.10476818

Robins J, Rotnitzky A. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 1995; 90: 106-21. http://dx.doi.org/10.1080/01621459.1995.10476493 DOI: https://doi.org/10.1080/01621459.1995.10476493

Molenberghs G. Missing data in clinical Studies. West Sussex, England: John Wiley & Sons; 2007. http://dx.doi.org/10.1002/9780470510445 DOI: https://doi.org/10.1002/9780470510445

Janssen KJM, Donders AR, Harrell J, Vergouwe Y, Chen Q, Grobbee DE. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010; 63(7): 721-7. http://dx.doi.org/10.1016/j.jclinepi.2009.12.008 DOI: https://doi.org/10.1016/j.jclinepi.2009.12.008

Xie H. Analyzing longitudinal clinical trial data with nonignorable missingness and unknown missingness reasons. Comput Stat Data An 2012; 56(5): 1287-300. http://dx.doi.org/10.1016/j.csda.2010.11.021 DOI: https://doi.org/10.1016/j.csda.2010.11.021

Ibrahim JG. Missing data methods in longitudinal studies: a review. Test 2009; 18(1): 1-43. http://dx.doi.org/10.1007/s11749-009-0138-x DOI: https://doi.org/10.1007/s11749-009-0138-x

Marshall A, Altman DG. Comparison of imputation methods for handling missing covariate data when fitting a Cox

proportional hazards model: a resampling study. BMC Med Res Methodol 2010; 10: 112. http://dx.doi.org/10.1186/1471-2288-10-112 DOI: https://doi.org/10.1186/1471-2288-10-112

White IR. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010; 29(28): 2920-31. http://dx.doi.org/10.1002/sim.3944 DOI: https://doi.org/10.1002/sim.3944

Garcia-Laencina PJ, Abreu PH, Abreu MH, Afonoso N. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Comput Biol Med 2015; 59: 125-33. http://dx.doi.org/10.1016/j.compbiomed.2015.02.006 DOI: https://doi.org/10.1016/j.compbiomed.2015.02.006

Saini I, Singh D, Khosla A. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. J Adv Res 2013; 4(4): 331-44. http://dx.doi.org/10.1016/j.jare.2012.05.007 DOI: https://doi.org/10.1016/j.jare.2012.05.007

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001; 17(6): 520-5. http://dx.doi.org/10.1093/bioinformatics/17.6.520 DOI: https://doi.org/10.1093/bioinformatics/17.6.520

Rosenbaum P. The central role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 1983; 7: 41-55. http://dx.doi.org/10.1093/biomet/70.1.41 DOI: https://doi.org/10.1093/biomet/70.1.41

Rubin DB.& Schenker. Multiple Imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 1986; 81: 366-74. http://dx.doi.org/10.1080/01621459.1986.10478280 DOI: https://doi.org/10.1080/01621459.1986.10478280

Molenberghs G. Linear mixed models forl Longitudinal data. Springer; 2000. http://dx.doi.org/10.1007/978-1-4419-0300-6 DOI: https://doi.org/10.1007/978-1-4419-0300-6

SAS/STAT 9.3 User's Guide (2nd Edn). Version 9.3. Cary NC: SAS Institute; 2009.

Esteban C, Quintana J, Moraza J, Aburto M, Aguirre U, Aguirregomoscorta J, et al. BODE-Index vs HADO-score in chronic obstructive pulmonary disease: Which one to use in general practice? BMC Med 2010; 8: 28. http://dx.doi.org/10.1186/1741-7015-8-28 DOI: https://doi.org/10.1186/1741-7015-8-28

Jones PW, Quirk FH, Baveystock CM. A self-complete measure of health status for chronic airflow limitation. The St. George's Respiratory Questionnaire. Am Rev Respir Dis 1992; 145(6): 1321-7. http://dx.doi.org/10.1164/ajrccm/145.6.1321 DOI: https://doi.org/10.1164/ajrccm/145.6.1321

Vittinghoff E, Glidden DV, Shiboski SC. Regression Methods in Biostatistics. Linear, Logistic, Survival, and Repeated Measures Models. Second Edition. Springer; 2012. DOI: https://doi.org/10.1007/978-1-4614-1353-0

Burton A, Altman DG, Royston P&HR. The design of simulation studies in medical statistics. Stat Med 2006; 25(24): 4279-92. http://dx.doi.org/10.1002/sim.2673 DOI: https://doi.org/10.1002/sim.2673

Collins LM, Schafer JL. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods 2001; 6(4): 330-51. http://dx.doi.org/10.1037/1082-989X.6.4.330 DOI: https://doi.org/10.1037/1082-989X.6.4.330

R: A language and environment for statistical computing, reference index version 3.0. Version 3.0. Viena, Austria: R Foundation for Statistical Computing; 2014.

Spratt M, Carpenter J, Sterne JAC, Carlin JB, Heron J, Henderson J&TK. Strategies for multiple imputation in longitudinal studies. Am J Epidemiol 2010; 172(4): 478-87. http://dx.doi.org/10.1093/aje/kwq137 DOI: https://doi.org/10.1093/aje/kwq137

Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009; 338: b2393. DOI: https://doi.org/10.1136/bmj.b2393

Mazumdar S, Tang G, Houck PR, Dew MA, Begley AE, Scott J, et al. Statistical analysis of longitudinal psychiatric data with dropouts. J Psychiatr Res 2007; 41(12): 1032-41. http://dx.doi.org/10.1016/j.jpsychires.2006.09.007 DOI: https://doi.org/10.1016/j.jpsychires.2006.09.007

Jorgensen AW, Lundstrom LH, Wetterslev J, Astrup A, Gotzsche PC. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial. PLoS One 2014; 9(11): e111964. DOI: https://doi.org/10.1371/journal.pone.0111964

Peters SA, Bots ML, den Ruijter HM, Palmer MK, Grobbee DE, Crouse JR, III, et al. Multiple imputation of missing repeated outcome measurements did not add to linear mixed-effects models. J Clin Epidemiol 2012; 65(6): 686-95. http://dx.doi.org/10.1016/j.jclinepi.2011.11.012 DOI: https://doi.org/10.1016/j.jclinepi.2011.11.012

Twisk J, de Boer M, de Vente W. Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis. J Clin Epidemiol 2013; 66: 1022-8. http://dx.doi.org/10.1016/j.jclinepi.2013.03.017 DOI: https://doi.org/10.1016/j.jclinepi.2013.03.017

Allison PD. Multiple imputation for missing data: A cautionary tale. Socio Meth Res 2000; 28: 309-10. http://dx.doi.org/10.1177/0049124100028003003 DOI: https://doi.org/10.1177/0049124100028003003

Satty A. Imputation methods for estimating regression parameters under a monotone missing covariate pattern: a comparative analysis. South African Statistical Journal 2012; 46: 327-56.

Schafer JL. Multiple imputation: a primer. Stats Methods Med Res 1999; 8(1): 3-15. http://dx.doi.org/10.1191/096228099671525676 DOI: https://doi.org/10.1191/096228099671525676

Kristman V, Manno M. Loss to follow-up in cohort studies: how much is too much? Eur J Epidemiol 2004; 19(8): 751-60. http://dx.doi.org/10.1023/B:EJEP.0000036568.02655.f8 DOI: https://doi.org/10.1023/B:EJEP.0000036568.02655.f8

Altman DG. Statistics in medical journals: some recent trends. Stat Med 2000; 19(23): 3275-89. http://dx.doi.org/10.1002/1097-0258(20001215)19:23<3275::AID-SIM626>3.0.CO;2-M DOI: https://doi.org/10.1002/1097-0258(20001215)19:23<3275::AID-SIM626>3.0.CO;2-M

Downloads

Published

2015-08-19

How to Cite

Aguirre, U., Arostegui, I., Esteban, C., & Quintana, J. M. (2015). Assessment of the Performance of Imputation Techniques in Observational Studies with Two Measurements. International Journal of Statistics in Medical Research, 4(3), 240–251. https://doi.org/10.6000/1929-6029.2015.04.03.1

Issue

Section

General Articles