Differential evolution in the decision tree learning algorithm

Sergei A. Mitrofanov; Митрофанов Сергей Александрович; Evgeny S. Semenkin; Семенкин Евгений Станиславович

doi:10.31772/2587-6066-2019-20-3-312-319

Differential evolution in the decision tree learning algorithm

作者: Mitrofanov S.A.¹, Semenkin E.S.¹
隶属关系:
1. Reshetnev Siberian State University of Science and Technology
期: 卷 20, 编号 3 (2019)
页面: 312-319
栏目: Section 1. Computer Science, Computer Engineering and Management
##submission.datePublished##: 15.09.2019
URL: https://journals.eco-vector.com/2712-8970/article/view/567828
DOI: https://doi.org/10.31772/2587-6066-2019-20-3-312-319
ID: 567828

如何引用文章

全文:

详细
作者简介
参考
补充文件
统计

详细

Decision trees (DT) belong to the most effective classification methods. The main advantage of decision trees is a simple and user-friendly interpretation of the results obtained. But despite its well-known advantages the method has some disadvantages as well. One of them is that DT training on high-dimensional data is very time-consuming. The paper considers the way to reduce the DT learning process duration without losses of classification accuracy. There are different algorithms of DT training; the main of them being ID3 and CART algorithms. The paper proposesa modification of DT learning algorithms by means of the information criterion optimization for some selected attribute. The use of this modification allows avoiding optimization by means of enumeration search over the entire data set. The Separation Measure method is used to select the attribute. The method selects the attribute whose class-based averages are most distant from each other. Optimization of the selected attribute is carried out using the method of differential evolution, which is one of the evolutionary modeling methods designed to solve problems of multidimensional optimization. Self-configuring at the population level based on the probabilities of using mutation operator’s variants was applied for differential evolution.

The classification problems were solved to compare standard DT learning algorithms with the modified ones. Algorithm efficiency refers to the percentage of correctly classified test sample objects. Statistical analysis based on Student's t-test was carried out to compare the efficiency of the algorithms.

The analysis showed that the use of the proposed modification of the DT learning algorithm makes it possible to significantly speed up the training process without losses in the classification effectiveness.

关键词

decision tree, classification, optimization, Separation Measure, differential evolution, Population-Level Dynamic Probabilities, Success History Adaptation

作者简介

Sergei Mitrofanov

Reshetnev Siberian State University of Science and Technology

编辑信件的主要联系方式.
Email: sergeimitrofanov95@gmail.com

Master student of the Department of System Analysis and Operations Research

俄罗斯联邦, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

Evgeny Semenkin

Reshetnev Siberian State University of Science and Technology

Email: eugenesemenkin@yandex.ru

Dr. Sc., Professor, Professor of the Department of System Analysis and Operations Research

俄罗斯联邦, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

参考

Breiman L., Friedman J. H., Olshen R. A., Stone C. T. Classification and Regression Trees. Wadsworth. Belmont. California. 1984, 128 p.
Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Springer, 2009, 189 p.
Ross Quinlan J. C4.5: Programs for Machine learning. Morgan Kaufmann Publishers. 1993, 302 p.
Quinlan J. R. Induction of decision trees. Machine learning. 1986, No. 1(1), P. 81–106.
David L. Davies, Donald W. Bouldin. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1979, Vol. PAMI-1, Iss. 2, P. 224–227.
Storn R. On the usage of differential evolution for function optimization. Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS). 2009, P. 519–523.
Goldberg D. E. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. 1989, 432 p.
Qin A. K., Suganthan P. N. Self-adaptive differential evolution algorithm for numerical optimization. Proceedings of the IEEE congress on evolutionary computation (CEC). 2005, P. 1785–1791.
Semenkin E. S., Semenkina M. E. Self-configuring Genetic Algorithm with Modified Uniform Crossover Operator. Advances in Swarm Intelligence. Lecture Notes in Computer Science 7331. Springer-Verlag, Berlin Heidelberg. 2012, P. 414–421.
Semenkin E., Semenkina M. Spacecrafts' control systems effective variants choice with self-configuring genetic algorithm. ICINCO 2012 – Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics. 2012, P. 84–93.
Tanabe R., Fukunaga A. Success-history based parameter adaptation for Differential Evolution. IEEE Congress on Evolutionary Computation. Cancun. 2013, P. 71–78.
Machine Learning Repository. Available at: https://archive.ics.uci.edu/ml/index.php (accessed 19.08.2018).
Gmurman V. E. Teoriya veroyatnostey i matematicheskaya statistika [Probability theory and mathematical statistics]. Moscow, Vysshaya shkola Publ., 2003, P. 303–304 (In Russ.)
Ayvazyan S. A., Enyukov I. S., Meshalkin L. D. Prikladnaya statistika: Osnovy modelirovaniya i pervichnaya obrabotka dannykh [Applied Statistics: Basics of modeling and primary data processing]. Moscow, Finansy i statistika Publ., 1983, 471 p. (In Russ.).
Rumshiskiy L. Z. Matematicheskaya obrabotka rezul’tatov eksperimenta [The mathematical processing of the experimental results]. Moscow, Nauka Publ., 1971, 192 p. (In Russ.).

补充文件

附件文件

动作

1. JATS XML

下载

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册