Applied classification problems using ridge regression

封面

如何引用文章

全文:

详细

The rapid development of technical devices and technology allows monitoring the properties of different physical nature objects with very small discreteness of the data. As a result, one can accumulate large amounts of data that can be used with advantage to manage an object, a multiply connected system, and a technological enterprise. However, regardless of the field of activity, the tasks associated with small amounts of data remains. In this case the dynamics of data accumulation depends on the objective limitations of the external world and the environment. The conducted research concerns high-dimensional data with small sample sizes. In this connection, the task of selecting informative features arises, which will allow both to improve the quality of problem solving by eliminating “junk” features, and to increase the speed of decision making, since algorithms are usually dependent on the dimension of the feature space, and simplify the data collection procedure (do not collect uninformative data). As the number of features can be large, it is impossible to use a complete search of all features spaces. Instead of it, for the selection of informative features, we propose a two-step random search algorithm based on the genetic algorithm uses: at the first stage, the search with limiting the number of features in the subset to reduce the feature space by eliminating “junk” features, at the second stage - without limitation, but on a reduced set features. The original problem formulation is the task of supervised classification when the object class is determined by an expert. The object attributes values vary depending on its state, which makes it belong to one or another class, that is, statistics has an offset in class. Without breaking the generality, for carrying out simulation modeling, a two-alternative formulation of the supervised classification task was used. Data from the field of medical diagnostics of the disease severity were used to generate training samples.

作者简介

Nadezhda Kononova

Siberian Federal University

Email: koplyarovanv@mail.ru

Сand. Sc., associate professor; Informational systems department

俄罗斯联邦, 79, Svobodny Av., 660041, Krasnoyarsk

Ekaterina Mangalova

“RD Scienc”

编辑信件的主要联系方式.
Email: e.s.mangalova@hotmail.com

software developer

俄罗斯联邦, 19, Kirova St., Krasnoyarsk, 660017

Anton Stroev

Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky

Email: antoxa134@mail.ru

Postgraduate Student

俄罗斯联邦, 1, Partizana Zheleznyaka St., Krasnoyarsk, 660022

Dmitry Cherdantsev

Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky

Email: gs7@mail.ru

Dr. Sc., Professor

俄罗斯联邦, 1, Partizana Zheleznyaka St., Krasnoyarsk, 660022

Olesya Chubarova

Reshetnev Siberian State University of Science and Technology

Email: kuznetcova_o@mail.ru

Сand. Sc., associate professor; System analysis and operations research department

俄罗斯联邦, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037

参考

  1. Vafaie H., De Jong K. Robust Feature Selection Algorithms. Proceedings of the IEEE International Conference on Tools with Artificial Intelligence. 1993, P. 356–363.
  2. Cormen T. H., Leiserson C. E., Rivest R. L., Stein C. Introduction to Algorithms. 3rd edition. The MIT Press. 2009, 1320 p.
  3. Narendra P., Fukunaga K. A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers. 1977, Vol. 26, P. 917–922.
  4. Foroutan I., Sklansky J. Feature Selection for Automatic Classification of non-Gaussian Data. IEEE Transactions on Systems, Man and Cybernetics. 1987, Vol. 17, P. 187–198.
  5. Kira K., Rendell L. A Practical Approach to Feature Selection. Proceedings of the Ninth International Conference on Machine Learning (Morgan Kaufmann). 1992, P. 249–256.
  6. Modrzejewski M. Feature Selection Using Rough Sets Theory. Proceedings of the European Conference on Machine Learning (Springer). 1993, P. 213–226.
  7. Liu H., Setiono R. Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence. 1995.
  8. John G., Kohavi R., Peger K. Irrelevant Features and the Subset Selection Problem. Machine Learning: Proceedings of the Eleventh International Confer-ence (Morgan Kaufmann). 1994, P. 121–129.
  9. Kohavi R., Frasca B. Useful Feature Subsets and Rough Set Reducts. Third International Workshop on Rough Sets and Soft Computing. 1994.
  10. Kohavi R. Feature Subset Selection as Search with Probabilistic Estimates. AAAI Fall Symposium on Relevance. 1994 .
  11. Koller D., Sahami M. Toward Optimal Feature Selection. Machine Learning: Proceedings of the Thirteenth International Conference (Morgan Kaufmann). 1996.
  12. Liu H., Setiono R. A Probabilistic Approach to Feature Selection – A Filter Solution. Proceedings of the Thirteenth International Conference on Machine Learning (Morgan Kaufmann). 1996.
  13. Liu H., Setiono R. Feature Selection and Classification – A Probabilistic Wrapper Approach. Proceedings of the Ninth International Conference on Industrial and Engineering Applications of AI and ES. 1996.
  14. Siedlecki W., Sklansky J. A Note on Genetic Algorithms for Large-scale Feature Selection. IEEE Transactions on Computers. 1989, Vol. 10, P. 335–347.
  15. Punch W., Goodman E., Pei M., Chia-Shun L., Hovland P., Enbody R. Further Research on Feature Selection and Classification Using Genetic Algorithms. Proceedings of the International Conference on Genetic Algorithms (Springer). 1993, P. 557–564.
  16. Brill F., Brown D., Martin W. Fast Genetic Selection of Features for Neural Network Classiers. IEEE Transactions on Neural Networks. 1992, Vol. 3(2), P. 324–328.
  17. Richeldi M., & Lanzi P. Performing Effective Feature Selection by Investigating the Deep Structure of the Data. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (AAAI Press). 1996, P. 379–383.
  18. Goldberg D. Genetic Algorithms in Search, Optimization, and Machine Learning. New York, Addison-Wesley, 1989.
  19. Mitchell M. An Introduction to Genetic algorithms. Cambridge, MA: MIT Press. 1996.
  20. Dreiper N., Smit G. Applied regression analysis. 1986, 351 p.

补充文件

附件文件
动作
1. JATS XML

版权所有 © Kononova N.V., Mangalova E.S., Stroev A.V., Cherdantsev D.V., Chubarova O.V., 2019

Creative Commons License
此作品已接受知识共享署名 4.0国际许可协议的许可
##common.cookie##