A Comparison of Parametric and Semi-Parametric Models for Microarray Data Analysis
Keywords:Copula, Goodness-of-fit, Melanoma, Microarray, Power, Type I error
Microarray technology has revolutionized genomic studies by enabling the study of differential expression of thousands of genes simultaneously. Parametric, nonparametric and semi-parametric statistical methods have been proposed for gene selection within the last sixteen years. In an effort to find the “gold standard", the performance of some common parametric and nonparametric methods have been compared in terms of power to select differentially expressed genes and other desirable properties. However, no such comparisons have been conducted between parametric and semi-parametric models. In this study, we compared a semi-parametric model based on copulas with a parametric model (the quantitative trait analysis or QTA model) in terms of power and the ability to control the Type I error rate. In addition, we proposed a simple algorithm for choosing an optimal copula. The two approaches were applied to a publicly available melanoma cell lines dataset for validation. Both methods performed well in terms of power but the copula approach was notably the better. In terms of the Type I error rate control, the two methods were comparable. More methods for selecting an optimal copula for gene expression data need to be developed, as the proposed procedure is limited to copulas that permit both negative and positive dependence only.
Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17: 509-519. https://doi.org/10.1093/bioinformatics/17.6.509 DOI: https://doi.org/10.1093/bioinformatics/17.6.509
Newton MA, Kendziorski CM, Richmond CS, Blattner FR. On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data. J Comput Biol 2001; 8: 37-52. https://doi.org/10.1089/106652701300099074 DOI: https://doi.org/10.1089/106652701300099074
Ibrahim JG, Chen MH, Gray RJ. Bayesian Models for Gene Expression With DNA Microarray Data. J Am Stat Assoc 2002; 97: 88-99. https://doi.org/10.1198/016214502753479257 DOI: https://doi.org/10.1198/016214502753479257
Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK. Gene selection: a Bayesian variable selection approach. Bioinformatics 2003; 19(1): 90-97. https://doi.org/10.1093/bioinformatics/19.1.90 DOI: https://doi.org/10.1093/bioinformatics/19.1.90
Kendziorski CM, Newton MA, Lan H, Gould MN. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat Med 2003; 22: 3899-3914. https://doi.org/10.1002/sim.1548 DOI: https://doi.org/10.1002/sim.1548
Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004; 3: Article 3. https://doi.org/10.2202/1544-6115.1027 DOI: https://doi.org/10.2202/1544-6115.1027
Scharpf RB, Tjelmeland H, Parmigiani G, Nobel AB. A Bayesian Model for Cross-Study Differential Gene Expression. J Am Stat Assoc 2009; 104: 1295-1310. https://doi.org/10.1198/jasa.2009.ap07611 DOI: https://doi.org/10.1198/jasa.2009.ap07611
Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, Kurachi K, et al. Delineation of prognostic biomarkers in prostate cancer. Nature 2001; 412(6849): 822-826. https://doi.org/10.1038/35090585 DOI: https://doi.org/10.1038/35090585
Wigle DA, Jurisica I, Radulovich N, Pintilie M, Rossant J, Liu N, et al. Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 2002; 62: 3005-3008.
Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 2004; 5: 155-176. https://doi.org/10.1093/biostatistics/5.2.155 DOI: https://doi.org/10.1093/biostatistics/5.2.155
Owzar K, Jung SH, Sen PK. A Copula Approach for Detec-ting Prognostic Genes Associated With Survival Outcome in Microarray Studies. Biometrics 2007; 63: 1089-1098. https://doi.org/10.1111/j.1541-0420.2007.00802.x DOI: https://doi.org/10.1111/j.1541-0420.2007.00802.x
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116-5121. https://doi.org/10.1073/pnas.091062498 DOI: https://doi.org/10.1073/pnas.091062498
Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96: 1151-1160. https://doi.org/10.1198/016214501753382129 DOI: https://doi.org/10.1198/016214501753382129
Le CT, Pan W, Lin J. A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics 2003; 3: 117-124. https://doi.org/10.1007/s10142-003-0085-7 DOI: https://doi.org/10.1007/s10142-003-0085-7
Pan W. On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 2003; 19: 1333-1340. https://doi.org/10.1093/bioinformatics/btg167 DOI: https://doi.org/10.1093/bioinformatics/btg167
Korn EL, Troendle JF, McShane LM, Simon R. Controlling the number of false discoveries: application to high-dimensional genomic data. J Stat Plan Inference 2004; 124: 379-398. https://doi.org/10.1016/S0378-3758(03)00211-8 DOI: https://doi.org/10.1016/S0378-3758(03)00211-8
Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-Array Tools. Cancer Inform 2007; 3: 11. DOI: https://doi.org/10.1177/117693510700300022
Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289-300. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Sklar. Fonctions de r'epartition 'a n dimensions et leures marges. Publications de l'Institut de Statistique de L'Universit'e de Paris 1959; 8: 229-231.
Genest C, Ghoudi K, Rivest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995; 82: 543-552. https://doi.org/10.1093/biomet/82.3.543 DOI: https://doi.org/10.1093/biomet/82.3.543
Joe H. Asymptotic efficiency of the two-stage estimation method for copula-based models. J Multivar Anal 2005; 94: 401-419. https://doi.org/10.1016/j.jmva.2004.06.003 DOI: https://doi.org/10.1016/j.jmva.2004.06.003
Westfall PH, Young SS. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons 1993; vol. 279.
Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440-9445. https://doi.org/10.1073/pnas.1530509100 DOI: https://doi.org/10.1073/pnas.1530509100
Kim JM, Jung YS, Sungur EA, Han KH, Park C, Sohn I. A copula method for modeling directional dependence of genes. BMC Bioinformatics 2008; 9: 225. https://doi.org/10.1186/1471-2105-9-225 DOI: https://doi.org/10.1186/1471-2105-9-225
Yuan A, Chen G, Zhou ZC, Bonney G, Rotimi C. Gene Copy Number Analysis for Family Data Using Semiparametric Copula Model. Bioinform Biol Insights 2008; 2: 343-355. DOI: https://doi.org/10.4137/BBI.S839
Wang A. Goodness-of-fit tests for Archimedean copula models. Stat Sin 2010; 20: 441.
Genest C, Quessy JF, Remillard B. Goodness-of-fit Procedures for Copula Models Based on the Probability Integral Transformation. Scand Stat Theory Appl 2006; 33: 337-366. https://doi.org/10.1111/j.1467-9469.2006.00470.x DOI: https://doi.org/10.1111/j.1467-9469.2006.00470.x
Dobri J, Schmid F. A goodness of fit test for copulas based on Rosenblatt's transformation. Comput Stat Data Anal 2007; 51: 4633-4642. https://doi.org/10.1016/j.csda.2006.08.012 DOI: https://doi.org/10.1016/j.csda.2006.08.012
Genest C, Remillard B, Beaudoin D. Goodness-of-fit tests for copulas: A review and a power study. Insur Math Econ 2009; 44: 199-213. https://doi.org/10.1016/j.insmatheco.2007.10.005 DOI: https://doi.org/10.1016/j.insmatheco.2007.10.005
Kim JM, Jung YS, Soderberg T. Directional Dependence of Genes Using Survival Truncated FGM Type Modification Copulas. Communications in Statistics - Simulation and Computation 2009; 38: 1470-1484. https://doi.org/10.1080/03610910903009336 DOI: https://doi.org/10.1080/03610910903009336
Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press; 1996.
Kaufmann WK, Nevis KR, Qu P, Ibrahim JG, Zhou T, Zhou Y, et al. Defective cell cycle checkpoint functions in melanoma are associated with altered patterns of gene expression. J Invest Dermatol 2008; 128: 175-187. https://doi.org/10.1038/sj.jid.5700935 DOI: https://doi.org/10.1038/sj.jid.5700935
Kaufmann WK, Carson CC, Omolo B, Filgo AJ, Sambade MJ, Simpson DA, et al. Mechanisms of chromosomal instability in melanoma: Chromosomal Instability in Melanoma. Environ Mol Mutagen 2014; 55: 457-471. https://doi.org/10.1002/em.21859 DOI: https://doi.org/10.1002/em.21859
How to Cite
Copyright (c) 2017 Linda Chaba, John Odhiambo, Bernard Omolo
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .