Methodology for constructing benchmarks for assessing the efficiency of feature selection methods when constructing regression
- Authors: Cheremuhin A.D.1, Lyamin A.S.1
-
Affiliations:
- Nizhny Novgorod State Engineering and Economics University
- Issue: Vol 32, No 1 (2026)
- Pages: 20-27
- Section: Modeling and optimization
- Published: 15.01.2026
- URL: https://journals.eco-vector.com/1684-6400/article/view/702340
- DOI: https://doi.org/10.17587/it.32.20-27
- ID: 702340
Cite item
Abstract
The problem of constructing benchmarks for assessing the efficiency of feature selection algorithms used in regression problems is considered. The developed benchmark includes synthetic and real dataframes that differ in various data parameters, such as dimensionality, correlation between features, noise level, and the ratio between numerical and categorical variables. А generation method using various distributions and control parameters is proposed for synthetic data. А computational experiment using a feature selection method based on the partial least squares method was conducted, demonstrating the efficiency of the proposed benchmark for objective comparison of algorithms. The ways of further development of the benchmark are outlined, including expanding the set of parameters and using new data types.
Full Text
About the authors
A. D. Cheremuhin
Nizhny Novgorod State Engineering and Economics University
Author for correspondence.
Email: ngieu.cheremuhin@yandex.ru
Cand. Econ. Sc., Associate Professor
Russian Federation, Nizhny Novgorod
A. S. Lyamin
Nizhny Novgorod State Engineering and Economics University
Email: a.s.lyamin@gmail.com
Cand. Econ. Sc., Associate Professor
Russian Federation, Nizhny NovgorodReferences
- Zhou Y., Cheng G., Jiang S., Da M. Building an efficient intrusion detection system based on feature selection and ensemble classifier, Computer networks, 2020, vol. 174, p. 107247, doi: 10.1016/j.comnet.2020.107247
- Zargari S., Voorhis D. Feature Selection in the Corrected KDD-dataset, 2012 Third International Conference on Emerging Intelligent Data and Web Technologies, IEEE, 2012, pp. 174—180, doi: 10.1109/EIDWT.2012.10
- Urbanowicz R. J., Olson R. S., Schmitt P., Meeker M., Moore J. H. Benchmarking relief-based feature selection methods for bioinformatics data mining, Journal of biomedical informatics, 2018, vol. 85, pp. 168—188, doi: 10.1016/j.jbi.2018.07.015
- Zhang Y., Mistry K., Lim C. P., Neoh S. C. Binary differential evolution with self-learning for multi-objective feature selection, Information Sciences, 2020, vol. 507, pp. 67—85, doi: 10.1016/j.ins.2019.08.040
- Visalakshi S., Radha V. А literature review of feature selection techniques and applications: Review of feature selection in data mining, 2014 IEEE international conference on computational intelligence and computing research, IEEE, 2014, pp. 1—6, doi: 10.1109/ICCIC.2014.7238499
- Singh R., Kumar H., Singla R. K. Analysis of feature selection techniques for network traffic dataset, 2013 international conference on machine intelligence and research advancement, IEEE, 2013, pp. 42—46, doi: 10.1109/ICMIRA.2013.15
- Hoffmann F., Bertram T., Mikut R., Reischl M., Nelles O. Benchmarking in classification and regression, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2019, vol. 9, no. 5, p. e1318, doi: 10.1002/widm.1318
- Bommert A., Sun X., Bischl B., Rahnenführer J., Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data, Computational Statistics & Data Analysis, 2020, vol. 143, p. 106839, doi: 10.1016/j.csda.2019.106839
- Hopf K., Reifenrath S. Filter Methods for Feature Selection in Supervised Machine Learning Applications--Review and Benchmark, arXiv preprint arXiv:2111.12140, 2021, doi: 10.48550/arXiv.2111.12140
- Overschie J. G. S., Alsahaf A., Azzopardi G. fseval: a benchmarking framework for feature selection and feature ranking algorithms, Journal of Open Source Software, 2022, vol. 7, no. 79, p. 4611, doi: 10.21105/joss.04611
- El-Kenawy E. S. M., Eid M. M., Saber M., Ibrahim А. MbGWO-SFS: Modified binary grey wolf optimizer based on stochastic fractal search for feature selection, IEEE Access, 2020, vol. 8, pp. 107635—107649, doi: 10.1109/ACCESS.2020.3001151
- Alhussan A. A. et al. А binary waterwheel plant optimization algorithm for feature selection, IEEE Access, 2023, vol. 11, pp. 94227—94251, doi: 10.1109/ACCESS.2023.3312022
- Li Y., Mansmann U., Du S., Hornung R. Benchmark study of feature selection strategies for multi-omics data, BMC bioinformatics, 2022, vol. 23, no. 1, p. 412, doi: 10.1186/s12859-022-04962-x
- Oreski D., Oreski S., Klicek В. Effects of dataset characteristics on the performance of feature selection techniques, Applied Soft Computing, 2017, vol. 52, pp. 109—119, doi: 10.1016/j.asoc.2016.12.023
- Parmezan A. R. S., Lee H. D., Spolaôr N., Wu F. C. Automatic recommendation of feature selection algorithms based on dataset characteristics, Expert Systems with Applications, 2021, vol. 185, p. 115589, doi: 10.1016/j.eswa.2021.115589
- Oliveira A. L. I., Braga P. L., Lima R. M. F., Cornélio M. L. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation, Information and Software Technology, 2010, vol. 52, no. 11, pp. 1155—1166, doi: 10.1016/j.infsof.2010.05.009
- Agrawal P., Abutarboush H. F., Ganesh T., Mohamed А. W. Metaheuristic algorithms on feature selection: А survey of one decade of research (2009-2019), IEEE Access, 2021, vol. 9, pp. 26766—26791, doi: 10.1109/ACCESS.2021.3056407
- Pearson K. Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation, Philosophical Transactions of the Royal Society A., 1901, no. 197, pp. 287—299, doi: 10.1098/rsta.1901.0023
- Endres D. M., Schindelin J. E. A new metric for probability distributions, IEEE Trans. Inf. Theory, 2003, vol. 49, no. 7, pp. 1858—1860, doi: 10.1109/TIT.2003.813506
- Ye K. Q. Orthogonal column Latin hypercubes and their application in computer experiments, Journal of the American Statistical Association, 1998, vol. 93, no. 444, pp. 1430—1439, doi: 10.1080/01621459.1998.10473803
- Mehmood T., Sæbø S., Liland K. H. Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics, 2020, vol. 34, no. 6, p. e3226, doi: 10.1002/cem.3226
- Jia P., Wang Y., Du Z., Zhao X., Wang Y., Chen B., Wang W., Guo H., Tang R. Erase: Benchmarking feature selection methods for deep recommender systems, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5194—5205. doi: 10.1145/3637528.3671571
- Karagiannaki K., Panousopoulou A., Tsakalides P. А benchmark study on feature selection for human activity recognition, Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, 2016, pp. 105—108, doi: 10.1145/2968219.2971421
- Generalov I. G. Cyclical influence of factors on the strategic development of grain production, Vestnik NGIEI, 2024, no. 8 (159), pp. 74—83 (in Russian), doi: 10.24412/2227-9407-2024-8-74-83
- Kafiev I. R., Romanov P. S., Romanov I. P. On the issue of automation of the process of sorting porcini mushrooms using neural networks, Vestnik NGIEI, 2024, no. 4 (155), pp. 34—49 (in Russian), doi: 10.24412/2227-9407-2024-4-34-49
Supplementary files


