Evaluation of Methods for Gene Selection in Melanoma Cell Lines

Linda Chaba; John Odhiambo; Bernard Omolo

doi:10.6000/1929-6029.2017.06.01.1

Authors

Linda Chaba Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
John Odhiambo Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
Bernard Omolo Division of Mathematics & Computer Science, University of South Carolina-Upstate, 800 University Way, Spartanburg, South Carolina, USA

DOI:

https://doi.org/10.6000/1929-6029.2017.06.01.1

Keywords:

Differential gene expression, Melanoma cell lines, Prediction, Power, Quantitative trait.

Abstract

A major objective in microarray experiments is to identify a panel of genes that are associated with a disease outcome or trait. Many statistical methods have been proposed for gene selection within the last fifteen years. While the comparison of some of these methods has been done, most of them concentrated on finding gene signatures based on two groups. This study evaluates four gene selection methods when the outcome of interested is continuous in nature. We provide a comparative review of four methods: the Statistical Analysis of Microarrays (SAM), the Linear Models for Microarray Analysis (LIMMA), the Lassoed Principal Components (LPC), and the Quantitative Trait Analysis (QTA). Comparison is based on the power to identify differentially expressed genes, the predictive ability of the genelists for a continuous outcome (G₂ checkpoint function), and the prognostic properties of the genelists for distant metastasis-free survival. A simulated dataset and a publicly available melanoma cell lines dataset are used for simulations and validation, respectively. A primary melanoma dataset is used for assessment of prognosis. No common genes were found among the genelists from the four methods. While the SAM was generally the best in terms of power, the QTA genelist performed the best in the prediction of the G2 checkpoint function. Identification of genelists depends on the choice of the gene selection method. The QTA method would be preferred over the other approaches in predicting a quantitative outcome in melanoma research. We recommend the development of more robust statistical methods for differential gene expression analysis.

References

J, Jose KK. Statistical tests for identification of differentially expressed genes in cDNA microarray experiments. Indian J Biotechnol 2008; 7: 423-436.

Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB. Nonparametric methods for identifying differentially

expressed genes in microarray data. Bioinformatics 2002; 18: 1454-1461. https://doi.org/10.1093/bioinformatics/18.11.1454 DOI: https://doi.org/10.1093/bioinformatics/18.11.1454

Schwender H, Krause A, Ickstadt K. Comparison of the empirical bayes and the significance analysis of microarrays. Technical Report//Universitt Dortmund, SFB 475, Reduction of complexity in multivariate data structures; 2003.

Jeffery IB, Higgins DG, Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 2006; 7: 359. https://doi.org/10.1186/1471-2105-7-359 DOI: https://doi.org/10.1186/1471-2105-7-359

Kim SY, Lee JW, Sohn IS. Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Stat Methods Med Res 2006; 15: 3-20. https://doi.org/10.1191/0962280206sm423oa DOI: https://doi.org/10.1191/0962280206sm423oa

Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, Guedj M. Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies. PLoS One 2010; 5: e12336. DOI: https://doi.org/10.1371/journal.pone.0012336

Bair E. Identification of significant features in DNA microarray data: Feature selection in DNA microarray data. Wiley Interdiscip Rev Comput Stat 2013; 5: 309-325. https://doi.org/10.1002/wics.1260 DOI: https://doi.org/10.1002/wics.1260

Bandyopadhyay S, Mallik S, Mukhopadhyay A. A survey and comparative study of statistical tests for identifying differential expression from microarray data. IEEE/ACM Trans Comput Biol Bioinformatics 2014; 11: 95-115. https://doi.org/10.1109/TCBB.2013.147 DOI: https://doi.org/10.1109/TCBB.2013.147

Kaufmann WK, Nevis KR, Qu P, Ibrahim JG, Zhou T, Zhou Y, et al. Defective cell cycle checkpoint functions in melanoma are associated with altered patterns of gene expression. J Invest Dermatol 2008; 128: 175-187. https://doi.org/10.1038/sj.jid.5700935 DOI: https://doi.org/10.1038/sj.jid.5700935

Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116-5121. https://doi.org/10.1073/pnas.091062498 DOI: https://doi.org/10.1073/pnas.091062498

Smyth GK. limma: Linear models for microarray data. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, Eds. Bioinformatics and computational biology solutions using R and Bioconductor. Springer New York 2005; pp. 397-420. DOI: https://doi.org/10.1007/0-387-29362-0_23

Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96: 1151-1160. https://doi.org/10.1198/016214501753382129 DOI: https://doi.org/10.1198/016214501753382129

Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004; 3: 1-25. https://doi.org/10.2202/1544-6115.1027 DOI: https://doi.org/10.2202/1544-6115.1027

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43(7): e47. https://doi.org/10.1093/nar/gkv007 DOI: https://doi.org/10.1093/nar/gkv007

Witten DM, Tibshirani R. Testing significance of features by lassoed principal components. Ann Appl Stat 2008; 2: 986-1012. https://doi.org/10.1214/08-AOAS182 DOI: https://doi.org/10.1214/08-AOAS182

Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-Array Tools. Cancer Inform 2007; 3: 11-17. DOI: https://doi.org/10.1177/117693510700300022

Korn EL, Troendle JF, McShane LM, Simon R. Controlling the number of false discoveries: application to high-dimensional genomic data. J Stat Plan Inference 2004; 124: 379-398. https://doi.org/10.1016/S0378-3758(03)00211-8 DOI: https://doi.org/10.1016/S0378-3758(03)00211-8

Golub GH, Van Loan CF. Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press; 1996. Available from: https://books.google.co.ke/books?id=mlOa7wPX6OYC.

Owzar K, Jung SH, Sen PK. A copula approach for detecting prognostic genes associated with survival outcome in microarray studies. Biometrics 2007; 63: 1089-1098. https://doi.org/10.1111/j.1541-0420.2007.00802.x DOI: https://doi.org/10.1111/j.1541-0420.2007.00802.x

Omolo B, Carson C, Chu H, Zhou Y, Simpson DA, Hesse JE, et al. A prognostic signature of G2 checkpoint function in melanoma cell lines. Cell Cycle 2013; 12: 1071-1082. https://doi.org/10.4161/cc.24067 DOI: https://doi.org/10.4161/cc.24067

Winnepenninckx V, Lazar V, Michiels S, Dessen P, Stas M, Alonso SR, et al. Gene expression profiling of primary cutaneous melanoma and clinical outcome. J Natl Cancer Inst 2006; 98: 472-482. https://doi.org/10.1093/jnci/djj103 DOI: https://doi.org/10.1093/jnci/djj103

Tibshirani RJ. Regression shrinkage and selection via the LASSO. J Roy Statist Soc B 1996; 58(1): 267-288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2004; 2. https://doi.org/10.1371/journal.pbio.0020108 DOI: https://doi.org/10.1371/journal.pbio.0020108

Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002; 99: 6567-6572. https://doi.org/10.1073/pnas.082099299 DOI: https://doi.org/10.1073/pnas.082099299

Andrew H, Florence G, Golum Kibria B. Methods for identifying differentially expressed genes: An empirical comparison. J Biom Biostat 2015; 6(5).

Kaufmann WK, Carson CC, Omolo B, Filgo AJ, Sambade MJ, Simpson DA, et al. Mechanisms of chromosomal instability in melanoma: Chromosomal Instability in Melanoma. Environ Mol Mutagen 2014; 55: 457-471. https://doi.org/10.1002/em.21859 DOI: https://doi.org/10.1002/em.21859