Methodology for constructing benchmarks for assessing the efficiency of feature selection methods when constructing regression

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription or Fee Access

Abstract

The problem of constructing benchmarks for assessing the efficiency of feature selection algorithms used in regression problems is considered. The developed benchmark includes synthetic and real dataframes that differ in various data parameters, such as dimensionality, correlation between features, noise level, and the ratio between numerical and categorical variables. А generation method using various distributions and control parameters is proposed for synthetic data. А computational experiment using a feature selection method based on the partial least squares method was conducted, demonstrating the efficiency of the proposed benchmark for objective comparison of algorithms. The ways of further development of the benchmark are outlined, including expanding the set of parameters and using new data types.

Full Text

Restricted Access

About the authors

A. D. Cheremuhin

Nizhny Novgorod State Engineering and Economics University

Author for correspondence.
Email: ngieu.cheremuhin@yandex.ru

Cand. Econ. Sc., Associate Professor

 

Russian Federation, Nizhny Novgorod

A. S. Lyamin

Nizhny Novgorod State Engineering and Economics University

Email: a.s.lyamin@gmail.com

Cand. Econ. Sc., Associate Professor

Russian Federation, Nizhny Novgorod

References

  1. Zhou Y., Cheng G., Jiang S., Da M. Building an efficient intrusion detection system based on feature selection and ensemble classifier, Computer networks, 2020, vol. 174, p. 107247, doi: 10.1016/j.comnet.2020.107247
  2. Zargari S., Voorhis D. Feature Selection in the Corrected KDD-dataset, 2012 Third International Conference on Emerging Intelligent Data and Web Technologies, IEEE, 2012, pp. 174—180, doi: 10.1109/EIDWT.2012.10
  3. Urbanowicz R. J., Olson R. S., Schmitt P., Meeker M., Moore J. H. Benchmarking relief-based feature selection methods for bioinformatics data mining, Journal of biomedical informatics, 2018, vol. 85, pp. 168—188, doi: 10.1016/j.jbi.2018.07.015
  4. Zhang Y., Mistry K., Lim C. P., Neoh S. C. Binary differential evolution with self-learning for multi-objective feature selection, Information Sciences, 2020, vol. 507, pp. 67—85, doi: 10.1016/j.ins.2019.08.040
  5. Visalakshi S., Radha V. А literature review of feature selection techniques and applications: Review of feature selection in data mining, 2014 IEEE international conference on computational intelligence and computing research, IEEE, 2014, pp. 1—6, doi: 10.1109/ICCIC.2014.7238499
  6. Singh R., Kumar H., Singla R. K. Analysis of feature selection techniques for network traffic dataset, 2013 international conference on machine intelligence and research advancement, IEEE, 2013, pp. 42—46, doi: 10.1109/ICMIRA.2013.15
  7. Hoffmann F., Bertram T., Mikut R., Reischl M., Nelles O. Benchmarking in classification and regression, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2019, vol. 9, no. 5, p. e1318, doi: 10.1002/widm.1318
  8. Bommert A., Sun X., Bischl B., Rahnenführer J., Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data, Computational Statistics & Data Analysis, 2020, vol. 143, p. 106839, doi: 10.1016/j.csda.2019.106839
  9. Hopf K., Reifenrath S. Filter Methods for Feature Selection in Supervised Machine Learning Applications--Review and Benchmark, arXiv preprint arXiv:2111.12140, 2021, doi: 10.48550/arXiv.2111.12140
  10. Overschie J. G. S., Alsahaf A., Azzopardi G. fseval: a benchmarking framework for feature selection and feature ranking algorithms, Journal of Open Source Software, 2022, vol. 7, no. 79, p. 4611, doi: 10.21105/joss.04611
  11. El-Kenawy E. S. M., Eid M. M., Saber M., Ibrahim А. MbGWO-SFS: Modified binary grey wolf optimizer based on stochastic fractal search for feature selection, IEEE Access, 2020, vol. 8, pp. 107635—107649, doi: 10.1109/ACCESS.2020.3001151
  12. Alhussan A. A. et al. А binary waterwheel plant optimization algorithm for feature selection, IEEE Access, 2023, vol. 11, pp. 94227—94251, doi: 10.1109/ACCESS.2023.3312022
  13. Li Y., Mansmann U., Du S., Hornung R. Benchmark study of feature selection strategies for multi-omics data, BMC bioinformatics, 2022, vol. 23, no. 1, p. 412, doi: 10.1186/s12859-022-04962-x
  14. Oreski D., Oreski S., Klicek В. Effects of dataset characteristics on the performance of feature selection techniques, Applied Soft Computing, 2017, vol. 52, pp. 109—119, doi: 10.1016/j.asoc.2016.12.023
  15. Parmezan A. R. S., Lee H. D., Spolaôr N., Wu F. C. Automatic recommendation of feature selection algorithms based on dataset characteristics, Expert Systems with Applications, 2021, vol. 185, p. 115589, doi: 10.1016/j.eswa.2021.115589
  16. Oliveira A. L. I., Braga P. L., Lima R. M. F., Cornélio M. L. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation, Information and Software Technology, 2010, vol. 52, no. 11, pp. 1155—1166, doi: 10.1016/j.infsof.2010.05.009
  17. Agrawal P., Abutarboush H. F., Ganesh T., Mohamed А. W. Metaheuristic algorithms on feature selection: А survey of one decade of research (2009-2019), IEEE Access, 2021, vol. 9, pp. 26766—26791, doi: 10.1109/ACCESS.2021.3056407
  18. Pearson K. Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation, Philosophical Transactions of the Royal Society A., 1901, no. 197, pp. 287—299, doi: 10.1098/rsta.1901.0023
  19. Endres D. M., Schindelin J. E. A new metric for probability distributions, IEEE Trans. Inf. Theory, 2003, vol. 49, no. 7, pp. 1858—1860, doi: 10.1109/TIT.2003.813506
  20. Ye K. Q. Orthogonal column Latin hypercubes and their application in computer experiments, Journal of the American Statistical Association, 1998, vol. 93, no. 444, pp. 1430—1439, doi: 10.1080/01621459.1998.10473803
  21. Mehmood T., Sæbø S., Liland K. H. Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics, 2020, vol. 34, no. 6, p. e3226, doi: 10.1002/cem.3226
  22. Jia P., Wang Y., Du Z., Zhao X., Wang Y., Chen B., Wang W., Guo H., Tang R. Erase: Benchmarking feature selection methods for deep recommender systems, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5194—5205. doi: 10.1145/3637528.3671571
  23. Karagiannaki K., Panousopoulou A., Tsakalides P. А benchmark study on feature selection for human activity recognition, Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, 2016, pp. 105—108, doi: 10.1145/2968219.2971421
  24. Generalov I. G. Cyclical influence of factors on the strategic development of grain production, Vestnik NGIEI, 2024, no. 8 (159), pp. 74—83 (in Russian), doi: 10.24412/2227-9407-2024-8-74-83
  25. Kafiev I. R., Romanov P. S., Romanov I. P. On the issue of automation of the process of sorting porcini mushrooms using neural networks, Vestnik NGIEI, 2024, no. 4 (155), pp. 34—49 (in Russian), doi: 10.24412/2227-9407-2024-4-34-49

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2026 Informacionnye Tehnologii



СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № 77 - 15565 от 02 июня 2003 г.