2011, Number 1
<< Back Next >>
Rev Mex Ing Biomed 2011; 32 (1)
A multiple-filter-GA-SVM method for dimension reduction and classification of DNA-microarray data
Hernández MLA, Bonilla HE, Morales CR
Language: English
References: 23
Page: 32-39
PDF size: 170.05 Kb.
ABSTRACT
The following article proposes a Multiple-Filter by using a genetic algorithm (GA) combined with a support vector machine (SVM) for gene selection and classification of DNA microarray data. The proposed method is designed to select a subset of relevant genes that classify the DNA-microarray data more accurately. First, three traditional statistical methods are used for gene selection. Then different relevant gene subsets are selected by using a GA/SVM framework using leave-one-out cross validation (LOOCV) to avoid data overfitting. A gene subset (niche), consisting of relevant genes, is obtained from each statistical method, by analyzing the frequency of each gene in the different gene subsets. Finally, the most frequent genes contained in the niche, are evaluated by the GA/SVM to obtain a final relevant gene subset. The proposed method is tested in two DNA-microarray datasets: Leukemia and colon. In the experimental results it is observed that the Multiple-Filter-GA-SVM (MF-GA-SVM) work very well by achieving lower classification error rates using a smaller number of selected genes than other methods reported in the literature.
REFERENCES
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: PNAS. USA. National Academy of Sciences. 1999: 6745–6750.
Ben-Dor L, Bruhn N, Friedman I, Nachman M, Schummer, Yakhini Z. Tissue classification with gene expression profiles. In: RECOMB, Journal of Computational Biology 2000: 54–64.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999; 286: 531–537.
Alizadeh A, Eisen M, Davis R, Ma C, Lossos I, Rosenwald A et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000; 403(6769): 503-11.
Shipp M, Ross K, Tamayo P, Weng A, Kutok J, Aguiar R et al. Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nat Med 2002: 68-74.
Schena M, Shalon D, Davis R, Brown P. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467-470.
Bobashev GV, Das S, Das A. Experimental design for gene microarray experiment and differential expression analyses. Methods of Microarray Data Analysis II 2001: 23-41.
Geoffrey J, Kim-Anh D, Ambroise C. Analyzing microarray gene expression data. Wiley. 2004.
Rusell S, Meadows LA, Rusell RR. Microarray technology in practice. Academic Press. First edition. 2009.
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical, 2002; 97(457): 77-87.
Deng L, Pei J, Ma J, Lee DL. Rank sum test method for informative gene discovery. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), 2004: 410-419.
Liu H, Li J, Wong L. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 2002; 13: 51-60.
Melani M. An introduction to genetic algorithms. MIT Press (Cambridge, Massachusetts • London, England), 1999.
Burges CJC. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 1998; 2(2): 121-167.
Joachims T. Making large-scale SVM learning practical. Advances in kernel methods-support vector learning. B. Schokopt et al. (editors), MIT Press, 1999.
Cho SB, Won HH. Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 2007; 26(3): 243-250.
Li S, Wu X, Hu X. Gene selection using genetic algorithm and support vectors machines. Soft Comput 2008; 12(7): 693–698.
Alba E, García-Nieto J, Jourdan L, Talbi EG. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. Congress on Evolutionary Computation 2007: 284-290.
Krishnapuram B, Carin L, Hartemink AJ. Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data. Journal Computer Biology 2004; 11(2–3): 227-242.
Xu R, Anagnostopoulos JC, Wunsch DC. Tissue classification trough analysis of gene expression data using a new family of art aechitectures. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 2002: 300-304.
Li X, Rao S, Zhang T, Guo Z, Moser KL, Topol EJ et al. An ensemble method for gene discovery based on DNA microarray data. Ser C Life Sciences 2004: 396-405.
Zhang H, Song X, Wang H, Zhang X. MIClique: an algorithm to Identify Differentially Co-expressed disease gene subsets from microarray data. Journal of Biomedicine and Biotechnology 2009: 9.
Cho SB. Exploring features and classifiers to classify gene expression profiles of acute leukemia. International Journal of Pattern Recognition and Artificial Intelligence 2002: 831-844.