Study of Population Structure and Genetic Prediction of Buffalo from Different Provinces of Iran using Machine Learning Method

Authors

  • Zahra Azizi Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
  • Hossein Moradi Shahrbabak Department of Animal Science, Faculty of Agricultural Science and Engineering, College of Agriculture and Natural Resources, University of Tehran, Iran
  • Seyed Abbas Rafat Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
  • Mohammad Moradi Shahrbabak Department of Animal Science, Faculty of Agricultural Science and Engineering, College of Agriculture and Natural Resources, University of Tehran, Iran
  • Jalil Shodja Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

DOI:

https://doi.org/10.6000/1927-520X.2020.09.07

Keywords:

Classification, Buffalo, Machine learning, SNP Chip data.

Abstract

Considering breeding livestock programs to milk production and type traits based on existence two different ecotypes of Iranian’s buffalo, a study carried out to investigate the population structure of Iranian buffalo and validate its classification accuracy according to different ecotypes from Iran (Azerbaijan and North) using data SNP chip 90K by means Support vector Machine (SVM), Random Forest (RF) and Discriminant Analysis Principal Component (DAPC) methods. A total of 258 buffalo were sampled and genotyped. The results of admixture, multidimensional scaling (MDS), and DAPC showed a close relationship between the animals of different provinces. Two ecotypes indicated higher accuracy of 96% that the Area Under Curve (AUC) confirmed the obtained result of the SVM approach while the DAPC and RF approach demonstrated lower accuracy of 88% and 80 %, respectively. SVM method proved high accuracy compared with DAPC and RF methods and assigned animals to their herds with more accuracy. According to these results, buffaloes distributed in two different ecotypes are one breed, and therefore the same breeding program should be used in the future. The water buffalo ecotype of the northern provinces of Iran and Azerbaijan seem to belong to the same population

References

Moaeen-ud-Din M, Bilal G. Sequence diversity and molecular evolutionary rates between buffalo and cattle. J Anim Breed Genet 2015; 132(1): 74-84. https://doi.org/10.1111/jbg.12100 DOI: https://doi.org/10.1111/jbg.12100

Bibi S, Khan MF, Rehman A. Population Diversity and Role in the Socioeconomic Development of Domestic Buffaloes of Rural Areas of District Haripur, KPK Pakistan. Journal of Buffalo Science 2018; 7(3): 38-42. https://doi.org/10.6000/1927-520X.2018.07.03.1 DOI: https://doi.org/10.6000/1927-520X.2018.07.03.1

Wilson RT. The Domestic (Water) Buffalo in Africa: New and Unusual Records. Journal of Buffalo Science 2016; 5(2): 23-31. https://doi.org/10.6000/1927-520X.2016.05.02.1 DOI: https://doi.org/10.6000/1927-520X.2016.05.02.1

Naserian AA, Saremi B. Water buffalo industry in Iran. Italian Journal of Animal Science 2010; 6(2s): 1404-5. https://doi.org/10.4081/ijas.2007.s2.1404 DOI: https://doi.org/10.4081/ijas.2007.s2.1404

McTavish EJ, Hillis DM. A Genomic Approach for Distinguishing between Recent and Ancient Admixture as Applied to Cattle. J Hered 2014. https://doi.org/10.1093/jhered/esu001 DOI: https://doi.org/10.1093/jhered/esu001

Lin BZ, Sasazaki S, Mannen H. Genetic diversity and structure in Bos taurus and Bos indicus populations analyzed by SNP markers. Anim Sci J 2010; 81(3): 281-9. https://doi.org/10.1111/j.1740-0929.2010.00744.x DOI: https://doi.org/10.1111/j.1740-0929.2010.00744.x

McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, et al. An assessment of population structure in eight breeds of cattle using a whole genome SNP panel. BMC Genet 2008; 9: 37. https://doi.org/10.1186/1471-2156-9-37 DOI: https://doi.org/10.1186/1471-2156-9-37

Epps CW, Castillo JA, Schmidt-Kuntzel A, du Preez P, Stuart-Hill G, Jago M, et al. Contrasting historical and recent gene flow among African buffalo herds in the Caprivi Strip of Namibia. J Hered 2013; 104(2): 172-81. https://doi.org/10.1093/jhered/ess142 DOI: https://doi.org/10.1093/jhered/ess142

Lykkjen S, Dolvik NI, McCue ME, Rendahl AK, Mickelson JR, Roed KH. Genome-wide association analysis of osteochondrosis of the tibiotarsal joint in Norwegian Standardbred trotters. Anim Genet 2010; 41 Suppl 2: 111-20. https://doi.org/10.1111/j.1365-2052.2010.02117.x DOI: https://doi.org/10.1111/j.1365-2052.2010.02117.x

Tian C, Gregersen PK, Seldin MF. Accounting for ancestry: population substructure and genome-wide association studies. Hum Mol Genet 2008; 17(R2): R143-50. https://doi.org/10.1093/hmg/ddn268 DOI: https://doi.org/10.1093/hmg/ddn268

Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, et al. Demonstrating stratification in a European American population. Nat Genet 2005; 37(8): 868-72. https://doi.org/10.1038/ng1607 DOI: https://doi.org/10.1038/ng1607

Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, et al. Machine learning in bioinformatics. Brief Bioinform 2006; 7(1): 86-112. https://doi.org/10.1093/bib/bbk007 DOI: https://doi.org/10.1093/bib/bbk007

Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000; 155(2): 945-59. DOI: https://doi.org/10.1093/genetics/155.2.945

Hoggart CJ, Shriver MD, Kittles RA, Clayton DG, McKeigue PM. Design and analysis of admixture mapping studies. The American Journal of Human Genetics 2004; 74(5): 965-78. https://doi.org/10.1086/420855 DOI: https://doi.org/10.1086/420855

Verdu P, Pemberton TJ, Laurent R, Kemp BM, Gonzalez-Oliver A, Gorodezky C, et al. Patterns of admixture and population structure in native populations of Northwest North America 2014. https://doi.org/10.1371/journal.pgen.1004530 DOI: https://doi.org/10.1371/journal.pgen.1004530

Patterson N, Price AL, Reich D. Population structure and eigenanalysis 2006. https://doi.org/10.1371/journal.pgen.0020190 DOI: https://doi.org/10.1371/journal.pgen.0020190

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 2006; 38(8): 904-9. https://doi.org/10.1038/ng1847 DOI: https://doi.org/10.1038/ng1847

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 2007; 81(3): 559-75. https://doi.org/10.1086/519795 DOI: https://doi.org/10.1086/519795

Li Q, Yu K. Improved correction for population stratification in genome‐wide association studies by identifying hidden population structures. Genetic Epidemiology 2008; 32(3): 215-26. https://doi.org/10.1002/gepi.20296 DOI: https://doi.org/10.1002/gepi.20296

Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics 2010; 11(1): 1. https://doi.org/10.1186/1471-2156-11-94 DOI: https://doi.org/10.1186/1471-2156-11-94

Jombart T, Collins C. A tutorial for discriminant analysis of principal components (DAPC) using adegenet 2.0. 0 2015.

Sethuraman A. On inferring and interpreting genetic population structure-applications to conservation, and the estimation of pairwise genetic relatedness 2013.

Chuluunbat B, Charruau P, Silbermayr K, Khorloojav T, Burger PA. Genetic diversity and population structure of Mongolian domestic Bactrian camels (Camelus bactrianus). Anim Genet 2014; 45(4): 550-8. https://doi.org/10.1111/age.12158 DOI: https://doi.org/10.1111/age.12158

Felicetti M, Lopes MS, Verini-Supplizi A, Machado Ada C, Silvestrelli M, Mendonca D, et al. Genetic diversity in the Maremmano horse and its relationship with other European horse breeds. Anim Genet 2010; 41 Suppl 2: 53-5. https://doi.org/10.1111/j.1365-2052.2010.02102.x DOI: https://doi.org/10.1111/j.1365-2052.2010.02102.x

Bigi D, Mucci N, Mengoni C, Baldaccini E, Randi E. Genetic investigation of Italian domestic pigeons increases knowledge about the long-bred history of Columba livia (Aves: Columbidae). Italian Journal of Zoology 2016; 83(2): 173-82. https://doi.org/10.1080/11250003.2016.1172121 DOI: https://doi.org/10.1080/11250003.2016.1172121

González-Recio O, Rosa GJ, Gianola D. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livestock Science 2014; 166: 217-31. https://doi.org/10.1016/j.livsci.2014.05.036 DOI: https://doi.org/10.1016/j.livsci.2014.05.036

Vapnik VN, Vapnik V. Statistical learning theory: Wiley New York; 1998.

Gunn SR. Support vector machines for classification and regression. ISIS technical report. 1998; 14.

Breiman L. Random forests. Machine learning 2001; 45(1): 5-32. https://doi.org/10.1023/A:1010933404324 DOI: https://doi.org/10.1023/A:1010933404324

Heuer C, Scheel C, Tetens J, Kühn C, Thaller G. Genomic prediction of unordered categorical traits: an application to subpopulation assignment in German Warmblood horses. Genetics Selection Evolution 2016; 48(1): 1. https://doi.org/10.1186/s12711-016-0192-2 DOI: https://doi.org/10.1186/s12711-016-0192-2

Swan AL, Mobasheri A, Allaway D, Liddell S, Bacardit J. Application of machine learning to proteomics data: classification and biomarker identification in post-genomics biology. OMICS 2013; 17(12): 595-610. https://doi.org/10.1089/omi.2013.0017 DOI: https://doi.org/10.1089/omi.2013.0017

Sun CS, Markey MK. Recent advances in computational analysis of mass spectrometry for proteomic profiling. J Mass Spectrom 2011; 46(5): 443-56. https://doi.org/10.1002/jms.1909 DOI: https://doi.org/10.1002/jms.1909

Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genome-wide association studies. Bioinformatics 2010; 26(4): 445-55. https://doi.org/10.1093/bioinformatics/btp713 DOI: https://doi.org/10.1093/bioinformatics/btp713

Goldstein BA, Hubbard AE, Cutler A, Barcellos LF. An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings. BMC genetics 2010; 11(1): 1. https://doi.org/10.1186/1471-2156-11-49 DOI: https://doi.org/10.1186/1471-2156-11-49

González-Recio O, Forni S. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol 2011; 43(7): 21329522. https://doi.org/10.1186/1297-9686-43-7

Long N, Gianola D, Rosa GJ, Weigel KA, Kranis A, Gonzalez-Recio O. Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetics Research 2010; 92(03): 209-25. https://doi.org/10.1017/S0016672310000157 DOI: https://doi.org/10.1017/S0016672310000157

Alberts CC, Ribeiro-Paes JT, Aranda-Selverio G, Cursino-Santos JR, Moreno-Cotulio VR, Oliveira AL, et al. DNA extraction from hair shafts of wild Brazilian felids and canids. Genet Mol Res 2010; 9(4): 2429-35. https://doi.org/10.4238/vol9-4gmr1027 DOI: https://doi.org/10.4238/vol9-4gmr1027

Grimberg J, Nawoschik S, Belluscio L, McKee R, Turck A, Eisenberg A. A simple and efficient non-organic procedure for the isolation of genomic DNA from blood. Nucleic Acids Res 1989; 17(20): 8390.

https://doi.org/10.1093/nar/17.20.8390 DOI: https://doi.org/10.1093/nar/17.20.8390

Barendse W, Harrison BE, Bunch RJ, Thomas MB, Turner LB. Genome wide signatures of positive selection: the comparison of independent samples and the identification of regions associated to traits. BMC Genomics 2009; 10: 178. https://doi.org/10.1186/1471-2164-10-178 DOI: https://doi.org/10.1186/1471-2164-10-178

Teo YY, Fry AE, Clark TG, Tai ES, Seielstad M. On the usage of HWE for identifying genotyping errors. Ann Hum Genet 2007; 71(Pt 5): 701-3. https://doi.org/10.1111/j.1469-1809.2007.00356.x DOI: https://doi.org/10.1111/j.1469-1809.2007.00356.x

Abdi H. Bonferroni and Šidák corrections for multiple comparisons(http://www.utdallas.edu/~herve/Abdi-Bonferroni2007-pretty.pdf). In NJ Salkind (ed.). Encyclopedia of Measurement and Statistics. Encyclopedia of measurement and statistics 2007.

Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009; 19(9): 1655-64. https://doi.org/10.1101/gr.094052.109 DOI: https://doi.org/10.1101/gr.094052.109

Kruskal JB, Wish M. Multidimensional scaling: Sage; 1978. https://doi.org/10.4135/9781412985130 DOI: https://doi.org/10.4135/9781412985130

Schwarz G. Estimating the dimension of a model. The Annals of Statistics 1978; 6(2): 461-4. https://doi.org/10.1214/aos/1176344136 DOI: https://doi.org/10.1214/aos/1176344136

Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 1958; 65(6): 386. https://doi.org/10.1037/h0042519 DOI: https://doi.org/10.1037/h0042519

Hsu C-W, Chang C-C, Lin C-J. A practical guide to support vector classification 2003.

Liaw A, Wiener M. Classification and regression by randomForest. R news 2002; 2(3): 18-22.

Díaz-Uriarte R, De Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006; 7(1): 1. https://doi.org/10.1186/1471-2105-7-3 DOI: https://doi.org/10.1186/1471-2105-7-3

Schaeffer L, Jamrozik J, Kistemaker G, Van Doormaal J. Experience with a test-day model. Journal of Dairy Science 2000; 83(5): 1135-44. https://doi.org/10.3168/jds.S0022-0302(00)74979-4 DOI: https://doi.org/10.3168/jds.S0022-0302(00)74979-4

https://cran.r-project.org/web/packages/GenABEL/index.html.

Swets JA. Measuring the accuracy of diagnostic systems. Science 1988; 240(4857): 1285-93.

https://doi.org/10.1126/science.3287615 DOI: https://doi.org/10.1126/science.3287615

Hand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning 2009; 77(1): 103-23. https://doi.org/10.1007/s10994-009-5119-5 DOI: https://doi.org/10.1007/s10994-009-5119-5

Gonzalez-Recio O, Forni S. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol 2011; 43: 7.

https://doi.org/10.1186/1297-9686-43-7 DOI: https://doi.org/10.1186/1297-9686-43-7

Schaeffer L. Application of random regression models in animal breeding. Livestock Production Science 2004; 86(1-3): 35-45. https://doi.org/10.1016/S0301-6226(03)00151-9 DOI: https://doi.org/10.1016/S0301-6226(03)00151-9

Geetha E, Chakravarty A, Kumar KV. Estimates of genetie parameters using random regression test day model for first lactation milk yield in Murrah buffaloes. The Indian Journal of Animal Sciences 2007; 77(9).

Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A. Misc functions of the Department of Statistics (e1071), TU Wien. R package 2008: 1.5-24.

Wacholder S, Rothman N, Caporaso N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Biomarkers Prev 2002; 11(6): 513-20.

Thomas DC, Witte JS. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomarkers Prev 2002; 11(6): 505-12.

Marks SJ, Montinaro F, Levy H, Brisighelli F, Ferri G, Bertoncini S, et al. Static and moving frontiers: the genetic landscape of Southern African Bantu-speaking populations. Molecular biology and evolution 2014: msu263. https://doi.org/10.1093/molbev/msu263 DOI: https://doi.org/10.1093/molbev/msu263

Sharma A, Lee S-H, Lim D, Chai H-H, Choi B-H, Cho Y. A genome-wide assessment of genetic diversity and population structure of Korean native cattle breeds. BMC Genetics 2016; 17(1): 139. https://doi.org/10.1186/s12863-016-0444-8 DOI: https://doi.org/10.1186/s12863-016-0444-8

Jemaa SB, Boussaha M, Mehdi MB, Lee JH, Lee S-H. Genome-wide insights into population structure and genetic history of Tunisian local cattle using the illumina bovinesnp50 beadchip. BMC Genomics 2015; 16(1): 1. https://doi.org/10.1186/s12864-015-1638-6 DOI: https://doi.org/10.1186/s12864-015-1638-6

Gutierrez S, Tardaguila J, Fernandez-Novales J, Diago MP. Support Vector Machine and Artificial Neural Network Models for the Classification of Grapevine Varieties Using a Portable NIR Spectrophotometer. PLoS ONE 2015; 10(11): e0143197.

https://doi.org/10.1371/journal.pone.0143197 DOI: https://doi.org/10.1371/journal.pone.0143197

Bridges M, Heron EA, O'Dushlaine C, Segurado R, Morris D, Corvin A, et al. Genetic classification of populations using supervised learning. PLoS One 2011; 6(5): e14802.

https://doi.org/10.1371/journal.pone.0014802 DOI: https://doi.org/10.1371/journal.pone.0014802

Statnikov A, Wang L, Aliferis CF. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 2008; 9(1): 319. https://doi.org/10.1186/1471-2105-9-319 DOI: https://doi.org/10.1186/1471-2105-9-319

Lee JW, Lee JB, Park M, Song SH. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 2005; 48(4): 869-85. https://doi.org/10.1016/j.csda.2004.03.017 DOI: https://doi.org/10.1016/j.csda.2004.03.017

Haasl RJ, McCarty CA, Payseur BA. Genetic ancestry inference using support vector machines, and the active emergence of a unique American population. European Journal of Human Genetics 2013; 21(5): 554-62. https://doi.org/10.1038/ejhg.2012.258 DOI: https://doi.org/10.1038/ejhg.2012.258

Downloads

Published

2020-04-18

How to Cite

Azizi, Z. ., Shahrbabak, H. M. ., Rafat, S. A. ., Shahrbabak, M. M. ., & Shodja, J. . (2020). Study of Population Structure and Genetic Prediction of Buffalo from Different Provinces of Iran using Machine Learning Method. Journal of Buffalo Science, 9, 48–59. https://doi.org/10.6000/1927-520X.2020.09.07

Issue

Section

Articles