Using Copulas to Select Prognostic Genes in Melanoma Patients


  • Linda Chaba Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
  • John Odhiambo Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
  • Bernard Omolo Division of Mathematics and Computer Science, University of South Carolina-Upstate, 800 University Way, Spartanburg, South Carolina, USA



Copula, False discovery rate, Melanoma, Microarray, Power


Melanoma of the skin is the fifth and seventh most commonly diagnosed carcinoma in men and women, respectively, in the USA. So far, gene signatures prognostic for overall and distant metastasis-free survival, for example, have been promising in the identification of therapeutic targets for primary and metastatic melanoma. But most of these gene signatures have been selected using statistics that depend entirely on the parametric distributions of the data (e.g. t-statistics). In this study, we assessed the impact of relaxing the parametric assumptions on the power of the models used for gene selection. We developed a semi-parametric model for feature selection that does not depend on the distributions of the covariates. This copula-based model only assumed that the marginal distributions of the covariates are continuous. Simulations indicated that the copula-based model had reasonable power at various levels of the false discovery rate (FDR). These results were validated in a publicly-available melanoma dataset. Relaxing parametric assumptions on microarray data may yield procedures that have good power for differential gene expression analysis.


Siegel RL, Miller KD, Jemal A. Cancer Statistics, 2017. CA Cancer J Clin 2017; 67: 7-30. DOI:

Winnepenninckx V, Lazar V, Michiels S, Dessen P, Stas M, Alonso SR, et al. Gene Expression Profiling of Primary Cutaneous Melanoma and Clinical Outcome. J Natl Cancer Inst 2006; 98: 472-482. DOI:

Mandruzzato S, Callegaro A, Turcatel G, Francescato S, Montesco MC, Chiarion-Sileni V, et al. A gene expression signature associated with survival in metastatic melanoma. J Transl Med 2006; 4: 50. DOI:

John T, Black MA, Toro TT, Leader D, Gedye CA, Davis ID, et al. Predicting Clinical Outcome through Molecular Profiling in Stage III Melanoma. Clin Cancer Res 2008; 14: 5173-5180. DOI:

Bogunovic D, O'Neill DW, Belitskaya-Levy I, Vacic V, Yu YL, Adams S, et al. Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival. Proc Natl Acad Sci USA 2009; 106: 20429-20434. DOI:

Jonsson G, Busch C, Knappskog S, Geisler J, Miletic H, Ringnr M, et al. Gene Expression Profiling Based Identification of Molecular Subtypes in Stage IV Melanomas with Different Clinical Outcome. Clin Cancer Res 2010; 16: 3356-3367. DOI:

Carson C, Omolo B, Chu H, Zhou Y, Sambade MJ, Peters EC, et al. A prognostic signature of defective p53-dependent G1 checkpoint function in melanoma cell lines: A signature of defective p53 function in melanoma. Pigment Cell Melanoma Res 2012; 25: 514-526. DOI:

Omolo B, Carson C, Chu H, Zhou Y, Simpson DA, Hesse JE, et al. A prognostic signature of G2 checkpoint function in melanoma cell lines. Cell Cycle 2013; 12: 1071-1082. DOI:

Kaufmann WK, Carson CC, Omolo B, Filgo AJ, Sambade MJ, Simpson DA, et al. Mechanisms of chromosomal instability in melanoma: Chromosomal Instability in Melanoma. Environ Mol Mutagen 2014; 55: 457-471. DOI:

Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116-5121. DOI:

Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002; 18: 1454-1461. DOI:

Chaba L, Odhiambo J, Omolo B. Evaluation of Methods for Gene Selection in Melanoma Cell Lines. Int J Stats Med Res 2017; 6: 1-9. DOI:

Bandyopadhyay S, Mallik S, Mukhopadhyay A. A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data. IEEE/ACM Trans Comput Biol Bioinformatics 2014; 11: 95-115. DOI:

Bair E. Identification of significant features in DNA microarray data: Feature selection in DNA microarray data. Wiley Interdiscip Rev Comput Stat 2013; 5: 309-325. DOI:

Genest C, Ghoudi K, Rvest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995; 82(3): 543-552. DOI:

Owzar K, Jung SH, Sen PK. A Copula Approach for Detecting Prognostic Genes Associated With Survival Outcome in Microarray Studies. Biometrics 2007; 63: 1089-1098. DOI:

Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289-300. Available from: DOI:

Sklar. Fonctions de r'epartition 'a n dimensions et leures marges. Publications de l'Institut de Statistique de L'Universit'e de Paris 1959; 8: 229-231.

Joe H. Asymptotic efficiency of the two-stage estimation method for copula-based models. J Multivar Anal 2005; 94: 401-419. DOI:

Westfall PH, Young SS. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons 1993; vol. 279.

Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440-9445. DOI:

Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press; 1996. Available from:

Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-Array Tools. Cancer Inform 2007; 3: 11. DOI:

Bair E, Tibshirani R. Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data. PLoS Biol 2004; 2. DOI:

Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols 2009; 4: 44-57. DOI:

Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Series B Stat Methodol 1996; 58: 267-288. DOI:

Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, et al. A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat Genet 2009; 41: 535-543. DOI:

Genest C, Quessy JF, Remillard B. Goodness-of-fit Procedures for Copula Models Based on the Probability Integral Transformation. Scand J Statist 2006; 33: 337-366. DOI:

Berg D. Copula goodness-of-fit testing: an overview and power comparison. Euro J Financ 2009; 15: 675-701. DOI:



How to Cite

Chaba, L., Odhiambo, J., & Omolo, B. (2017). Using Copulas to Select Prognostic Genes in Melanoma Patients. International Journal of Statistics in Medical Research, 6(3), 114–122.



General Articles

Most read articles by the same author(s)