Differential evolution in the decision tree learning algorithm

Cover Page

Cite item

Full Text

Abstract

Decision trees (DT) belong to the most effective classification methods. The main advantage of decision trees is a simple and user-friendly interpretation of the results obtained. But despite its well-known advantages the method has some disadvantages as well. One of them is that DT training on high-dimensional data is very time-consuming. The paper considers the way to reduce the DT learning process duration without losses of classification accuracy. There are different algorithms of DT training; the main of them being ID3 and CART algorithms. The paper proposesa modification of DT learning algorithms by means of the information criterion optimization for some selected attribute. The use of this modification allows avoiding optimization by means of enumeration search over the entire data set. The Separation Measure method is used to select the attribute. The method selects the attribute whose class-based averages are most distant from each other. Optimization of the selected attribute is carried out using the method of differential evolution, which is one of the evolutionary modeling methods designed to solve problems of multidimensional optimization. Self-configuring at the population level based on the probabilities of using mutation operator’s variants was applied for differential evolution.

The classification problems were solved to compare standard DT learning algorithms with the modified ones. Algorithm efficiency refers to the percentage of correctly classified test sample objects. Statistical analysis based on Student's t-test was carried out to compare the efficiency of the algorithms.

The analysis showed that the use of the proposed modification of the DT learning algorithm makes it possible to significantly speed up the training process without losses in the classification effectiveness.

About the authors

Sergei A. Mitrofanov

Reshetnev Siberian State University of Science and Technology

Author for correspondence.
Email: sergeimitrofanov95@gmail.com

Master student of the Department of System Analysis and Operations Research

Russian Federation, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

Evgeny S. Semenkin

Reshetnev Siberian State University of Science and Technology

Email: eugenesemenkin@yandex.ru

Dr. Sc., Professor, Professor of the Department of System Analysis and Operations Research

Russian Federation, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

References

  1. Breiman L., Friedman J. H., Olshen R. A., Stone C. T. Classification and Regression Trees. Wadsworth. Belmont. California. 1984, 128 p.
  2. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Springer, 2009, 189 p.
  3. Ross Quinlan J. C4.5: Programs for Machine learning. Morgan Kaufmann Publishers. 1993, 302 p.
  4. Quinlan J. R. Induction of decision trees. Machine learning. 1986, No. 1(1), P. 81–106.
  5. David L. Davies, Donald W. Bouldin. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1979, Vol. PAMI-1, Iss. 2, P. 224–227.
  6. Storn R. On the usage of differential evolution for function optimization. Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS). 2009, P. 519–523.
  7. Goldberg D. E. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. 1989, 432 p.
  8. Qin A. K., Suganthan P. N. Self-adaptive differential evolution algorithm for numerical optimization. Proceedings of the IEEE congress on evolutionary computation (CEC). 2005, P. 1785–1791.
  9. Semenkin E. S., Semenkina M. E. Self-configuring Genetic Algorithm with Modified Uniform Crossover Operator. Advances in Swarm Intelligence. Lecture Notes in Computer Science 7331. Springer-Verlag, Berlin Heidelberg. 2012, P. 414–421.
  10. Semenkin E., Semenkina M. Spacecrafts' control systems effective variants choice with self-configuring genetic algorithm. ICINCO 2012 – Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics. 2012, P. 84–93.
  11. Tanabe R., Fukunaga A. Success-history based parameter adaptation for Differential Evolution. IEEE Congress on Evolutionary Computation. Cancun. 2013, P. 71–78.
  12. Machine Learning Repository. Available at: https://archive.ics.uci.edu/ml/index.php (accessed 19.08.2018).
  13. Gmurman V. E. Teoriya veroyatnostey i matematicheskaya statistika [Probability theory and mathematical statistics]. Moscow, Vysshaya shkola Publ., 2003, P. 303–304 (In Russ.)
  14. Ayvazyan S. A., Enyukov I. S., Meshalkin L. D. Prikladnaya statistika: Osnovy modelirovaniya i pervichnaya obrabotka dannykh [Applied Statistics: Basics of modeling and primary data processing]. Moscow, Finansy i statistika Publ., 1983, 471 p. (In Russ.).
  15. Rumshiskiy L. Z. Matematicheskaya obrabotka rezul’tatov eksperimenta [The mathematical processing of the experimental results]. Moscow, Nauka Publ., 1971, 192 p. (In Russ.).

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2019 Mitrofanov S.A., Semenkin E.S.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies