Differential evolution in the decision tree learning algorithm

Sergei A. Mitrofanov; Митрофанов Сергей Александрович; Evgeny S. Semenkin; Семенкин Евгений Станиславович

doi:10.31772/2587-6066-2019-20-3-312-319

Differential evolution in the decision tree learning algorithm

Authors: Mitrofanov S.A.¹, Semenkin E.S.¹
Affiliations:
1. Reshetnev Siberian State University of Science and Technology
Issue: Vol 20, No 3 (2019)
Pages: 312-319
Section: Section 1. Computer Science, Computer Engineering and Management
Published: 15.09.2019
URL: https://journals.eco-vector.com/2712-8970/article/view/567828
DOI: https://doi.org/10.31772/2587-6066-2019-20-3-312-319
ID: 567828

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

Decision trees (DT) belong to the most effective classification methods. The main advantage of decision trees is a simple and user-friendly interpretation of the results obtained. But despite its well-known advantages the method has some disadvantages as well. One of them is that DT training on high-dimensional data is very time-consuming. The paper considers the way to reduce the DT learning process duration without losses of classification accuracy. There are different algorithms of DT training; the main of them being ID3 and CART algorithms. The paper proposesa modification of DT learning algorithms by means of the information criterion optimization for some selected attribute. The use of this modification allows avoiding optimization by means of enumeration search over the entire data set. The Separation Measure method is used to select the attribute. The method selects the attribute whose class-based averages are most distant from each other. Optimization of the selected attribute is carried out using the method of differential evolution, which is one of the evolutionary modeling methods designed to solve problems of multidimensional optimization. Self-configuring at the population level based on the probabilities of using mutation operator’s variants was applied for differential evolution.

The classification problems were solved to compare standard DT learning algorithms with the modified ones. Algorithm efficiency refers to the percentage of correctly classified test sample objects. Statistical analysis based on Student's t-test was carried out to compare the efficiency of the algorithms.

The analysis showed that the use of the proposed modification of the DT learning algorithm makes it possible to significantly speed up the training process without losses in the classification effectiveness.

Keywords

decision tree, classification, optimization, Separation Measure, differential evolution, Population-Level Dynamic Probabilities, Success History Adaptation

About the authors

Sergei A. Mitrofanov

Reshetnev Siberian State University of Science and Technology

Author for correspondence.
Email: sergeimitrofanov95@gmail.com

Master student of the Department of System Analysis and Operations Research

Russian Federation, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

Evgeny S. Semenkin

Reshetnev Siberian State University of Science and Technology

Email: eugenesemenkin@yandex.ru

Dr. Sc., Professor, Professor of the Department of System Analysis and Operations Research

Russian Federation, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

References

Breiman L., Friedman J. H., Olshen R. A., Stone C. T. Classification and Regression Trees. Wadsworth. Belmont. California. 1984, 128 p.
Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Springer, 2009, 189 p.
Ross Quinlan J. C4.5: Programs for Machine learning. Morgan Kaufmann Publishers. 1993, 302 p.
Quinlan J. R. Induction of decision trees. Machine learning. 1986, No. 1(1), P. 81–106.
David L. Davies, Donald W. Bouldin. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1979, Vol. PAMI-1, Iss. 2, P. 224–227.
Storn R. On the usage of differential evolution for function optimization. Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS). 2009, P. 519–523.
Goldberg D. E. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. 1989, 432 p.
Qin A. K., Suganthan P. N. Self-adaptive differential evolution algorithm for numerical optimization. Proceedings of the IEEE congress on evolutionary computation (CEC). 2005, P. 1785–1791.
Semenkin E. S., Semenkina M. E. Self-configuring Genetic Algorithm with Modified Uniform Crossover Operator. Advances in Swarm Intelligence. Lecture Notes in Computer Science 7331. Springer-Verlag, Berlin Heidelberg. 2012, P. 414–421.
Semenkin E., Semenkina M. Spacecrafts' control systems effective variants choice with self-configuring genetic algorithm. ICINCO 2012 – Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics. 2012, P. 84–93.
Tanabe R., Fukunaga A. Success-history based parameter adaptation for Differential Evolution. IEEE Congress on Evolutionary Computation. Cancun. 2013, P. 71–78.
Machine Learning Repository. Available at: https://archive.ics.uci.edu/ml/index.php (accessed 19.08.2018).
Gmurman V. E. Teoriya veroyatnostey i matematicheskaya statistika [Probability theory and mathematical statistics]. Moscow, Vysshaya shkola Publ., 2003, P. 303–304 (In Russ.)
Ayvazyan S. A., Enyukov I. S., Meshalkin L. D. Prikladnaya statistika: Osnovy modelirovaniya i pervichnaya obrabotka dannykh [Applied Statistics: Basics of modeling and primary data processing]. Moscow, Finansy i statistika Publ., 1983, 471 p. (In Russ.).
Rumshiskiy L. Z. Matematicheskaya obrabotka rezul’tatov eksperimenta [The mathematical processing of the experimental results]. Moscow, Nauka Publ., 1971, 192 p. (In Russ.).

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register