<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Informacionnye Tehnologii</journal-id><journal-title-group><journal-title xml:lang="en">Informacionnye Tehnologii</journal-title><trans-title-group xml:lang="ru"><trans-title>Информационные технологии</trans-title></trans-title-group></journal-title-group><issn publication-format="print">1684-6400</issn><publisher><publisher-name xml:lang="en">New Technologies Publishing House</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">702340</article-id><article-id pub-id-type="doi">10.17587/it.32.20-27</article-id><article-categories><subj-group subj-group-type="toc-heading" xml:lang="en"><subject>Modeling and optimization</subject></subj-group><subj-group subj-group-type="toc-heading" xml:lang="ru"><subject>Моделирование и оптимизация</subject></subj-group><subj-group subj-group-type="article-type"><subject>Research Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Methodology for constructing benchmarks for assessing the efficiency of feature selection methods when constructing regression</article-title><trans-title-group xml:lang="ru"><trans-title>Методика построения бенчмарков для оценки эффективности методов отбора признаков при решении задач регрессии</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="en"><surname>Cheremuhin</surname><given-names>A. D.</given-names></name><name xml:lang="ru"><surname>Черемухин</surname><given-names>А. Д.</given-names></name></name-alternatives><address><country country="RU">Russian Federation</country></address><bio xml:lang="en"><p>Cand. Econ. Sc., Associate Professor</p>
<p> </p></bio><bio xml:lang="ru"><p>канд. экон. наук, доц.</p></bio><email>ngieu.cheremuhin@yandex.ru</email><xref ref-type="aff" rid="aff1"/></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="en"><surname>Lyamin</surname><given-names>A. S.</given-names></name><name xml:lang="ru"><surname>Лямин</surname><given-names>А. С.</given-names></name></name-alternatives><address><country country="RU">Russian Federation</country></address><bio xml:lang="en"><p>Cand. Econ. Sc., Associate Professor</p></bio><bio xml:lang="ru"><p>канд. экон. наук, доц.</p></bio><email>a.s.lyamin@gmail.com</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff-alternatives id="aff1"><aff><institution xml:lang="en">Nizhny Novgorod State Engineering and Economics University</institution></aff><aff><institution xml:lang="ru">Нижегородский государственный инженерно-экономический университет</institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2026-01-15" publication-format="electronic"><day>15</day><month>01</month><year>2026</year></pub-date><volume>32</volume><issue>1</issue><issue-title xml:lang="en"/><issue-title xml:lang="ru"/><fpage>20</fpage><lpage>27</lpage><history><date date-type="received" iso-8601-date="2026-02-08"><day>08</day><month>02</month><year>2026</year></date><date date-type="accepted" iso-8601-date="2026-02-08"><day>08</day><month>02</month><year>2026</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2026, Informacionnye Tehnologii</copyright-statement><copyright-statement xml:lang="ru">Copyright ©; 2026, Информационные технологии</copyright-statement><copyright-year>2026</copyright-year><copyright-holder xml:lang="en">Informacionnye Tehnologii</copyright-holder><copyright-holder xml:lang="ru">Информационные технологии</copyright-holder></permissions><self-uri xlink:href="https://journals.eco-vector.com/1684-6400/article/view/702340">https://journals.eco-vector.com/1684-6400/article/view/702340</self-uri><abstract xml:lang="en"><p>The problem of constructing benchmarks for assessing the efficiency of feature selection algorithms used in regression problems is considered. The developed benchmark includes synthetic and real dataframes that differ in various data parameters, such as dimensionality, correlation between features, noise level, and the ratio between numerical and categorical variables. А generation method using various distributions and control parameters is proposed for synthetic data. А computational experiment using a feature selection method based on the partial least squares method was conducted, demonstrating the efficiency of the proposed benchmark for objective comparison of algorithms. The ways of further development of the benchmark are outlined, including expanding the set of parameters and using new data types.</p></abstract><trans-abstract xml:lang="ru"><p>Рассматривается проблема построения бенчмарков для оценки эффективности алгоритмов отбора признаков, применяемых в задачах регрессии. Разработанный бенчмарк включает синтетические и реальные датафреймы, отличающиеся разнообразными параметрами данных, такими как размерность, корреляция между признаками, уровень шума и соотношение между числовыми и категориальными переменными. Для синтетических данных предложен метод генерации с использованием различных распределений и контрольных параметров. В отличие от традиционных бенчмарков разработанная методика строится не на использовании устоявшихся библиотек, а на оригинальном подходе к генерации и отбору данных, ориентированном на редкие и трудноформализуемые свойства реальных задач. Проведен вычислительный эксперимент с использованием метода отбора признаков, основанного на методе частичных наименьших квадратов, продемонстрировавший эффективность предложенного бенчмарка для объективного сравнения алгоритмов. Намечены пути дальнейшего развития бенчмарка, включая расширение набора параметров и использование новых типов данных.</p></trans-abstract><kwd-group xml:lang="en"><kwd>feature selection</kwd><kwd>benchmark</kwd><kwd>regression problems</kwd><kwd>synthetic data</kwd><kwd>feature correlation</kwd><kwd>filtering methods</kwd><kwd>wrapper methods</kwd><kwd>built-in methods</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>отбор признаков</kwd><kwd>бенчмарк</kwd><kwd>задачи регрессии</kwd><kwd>синтетические данные</kwd><kwd>корреляция признаков</kwd><kwd>методы фильтрации</kwd><kwd>методы обертки</kwd><kwd>встроенные методы</kwd></kwd-group><funding-group/></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><citation-alternatives><mixed-citation xml:lang="en">Zhou Y., Cheng G., Jiang S., Da M. Building an efficient intrusion detection system based on feature selection and ensemble classifier, Computer networks, 2020, vol. 174, p. 107247, DOI: 10.1016/j.comnet.2020.107247</mixed-citation><mixed-citation xml:lang="ru">Zhou Y. et al. Building an efficient intrusion detection system based on feature selection and ensemble classifier // Computer networks. 2020. Vol. 174. P. 107247. DOI: 10.1016/j.comnet.2020.107247</mixed-citation></citation-alternatives></ref><ref id="B2"><label>2.</label><citation-alternatives><mixed-citation xml:lang="en">Zargari S., Voorhis D. Feature Selection in the Corrected KDD-dataset, 2012 Third International Conference on Emerging Intelligent Data and Web Technologies, IEEE, 2012, pp. 174—180, DOI: 10.1109/EIDWT.2012.10</mixed-citation><mixed-citation xml:lang="ru">Zargari S., Voorhis D. Feature Selection in the Corrected KDD-dataset // 2012 Third International Conference on Emerging Intelligent Data and Web Technologies. IEEE, 2012. P. 174—180. DOI: 10.1109/EIDWT.2012.10</mixed-citation></citation-alternatives></ref><ref id="B3"><label>3.</label><citation-alternatives><mixed-citation xml:lang="en">Urbanowicz R. J., Olson R. S., Schmitt P., Meeker M., Moore J. H. Benchmarking relief-based feature selection methods for bioinformatics data mining, Journal of biomedical informatics, 2018, vol. 85, pp. 168—188, DOI:10.1016/j.jbi.2018.07.015</mixed-citation><mixed-citation xml:lang="ru">Urbanowicz R. J. et al. Benchmarking relief-based feature selection methods for bioinformatics data mining //Journal of biomedical informatics. 2018. Vol. 85. P. 168—188. DOI: 10.1016/j.jbi.2018.07.015</mixed-citation></citation-alternatives></ref><ref id="B4"><label>4.</label><citation-alternatives><mixed-citation xml:lang="en">Zhang Y., Mistry K., Lim C. P., Neoh S. C. Binary differential evolution with self-learning for multi-objective feature selection, Information Sciences, 2020, vol. 507, pp. 67—85, DOI: 10.1016/j.ins.2019.08.040</mixed-citation><mixed-citation xml:lang="ru">Zhang Y. et al. Binary differential evolution with self-learning for multi-objective feature selection //Information Sciences. 2020. Vol. 507. P. 67—-85. DOI: 10.1016/j.ins.2019.08.040</mixed-citation></citation-alternatives></ref><ref id="B5"><label>5.</label><citation-alternatives><mixed-citation xml:lang="en">Visalakshi S., Radha V. А literature review of feature selection techniques and applications: Review of feature selection in data mining, 2014 IEEE international conference on computational intelligence and computing research, IEEE, 2014, pp. 1—6, DOI: 10.1109/ICCIC.2014.7238499</mixed-citation><mixed-citation xml:lang="ru">Visalakshi S., Radha V. А literature review of feature selection techniques and applications: Review of feature selection in data mining // 2014 IEEE international conference on computational intelligence and computing research. IEEE, 2014. P. 1—6. DOI: 10.1109/ICCIC.2014.7238499</mixed-citation></citation-alternatives></ref><ref id="B6"><label>6.</label><citation-alternatives><mixed-citation xml:lang="en">Singh R., Kumar H., Singla R. K. Analysis of feature selection techniques for network traffic dataset, 2013 international conference on machine intelligence and research advancement, IEEE, 2013, pp. 42—46, DOI: 10.1109/ICMIRA.2013.15</mixed-citation><mixed-citation xml:lang="ru">Singh R., Kumar H., Singla R. K. Analysis of feature selection techniques for network traffic dataset //2013 international conference on machine intelligence and research advancement. IEEE, 2013. P. 42—46. DOI: 10.1109/ICMIRA.2013.15</mixed-citation></citation-alternatives></ref><ref id="B7"><label>7.</label><citation-alternatives><mixed-citation xml:lang="en">Hoffmann F., Bertram T., Mikut R., Reischl M., Nelles O. Benchmarking in classification and regression, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2019, vol. 9, no. 5, p. e1318, DOI: 10.1002/widm.1318</mixed-citation><mixed-citation xml:lang="ru">Hoffmann F. et al. Benchmarking in classification and regression //Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019. Vol. 9. N. 5. P. e1318. DOI: 10.1002/widm.1318</mixed-citation></citation-alternatives></ref><ref id="B8"><label>8.</label><citation-alternatives><mixed-citation xml:lang="en">Bommert A., Sun X., Bischl B., Rahnenführer J., Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data, Computational Statistics &amp; Data Analysis, 2020, vol. 143, p. 106839, DOI: 10.1016/j.csda.2019.106839</mixed-citation><mixed-citation xml:lang="ru">Bommert A. et al. Benchmark for filter methods for feature selection in high-dimensional classification data //Computational Statistics &amp; Data Analysis. 2020. Vol. 143. P. 106839. DOI: 10.1016/j.csda.2019.106839</mixed-citation></citation-alternatives></ref><ref id="B9"><label>9.</label><citation-alternatives><mixed-citation xml:lang="en">Hopf K., Reifenrath S. Filter Methods for Feature Selection in Supervised Machine Learning Applications--Review and Benchmark, arXiv preprint arXiv:2111.12140, 2021, DOI: 10.48550/arXiv.2111.12140</mixed-citation><mixed-citation xml:lang="ru">Hopf K., Reifenrath S. Filter Methods for Feature Selection in Supervised Machine Learning Applications--Review and Benchmark // arXiv preprint arXiv:2111.12140. 2021. DOI: 10.48550/arXiv.2111.12140</mixed-citation></citation-alternatives></ref><ref id="B10"><label>10.</label><citation-alternatives><mixed-citation xml:lang="en">Overschie J. G. S., Alsahaf A., Azzopardi G. fseval: a benchmarking framework for feature selection and feature ranking algorithms, Journal of Open Source Software, 2022, vol. 7, no. 79, p. 4611, DOI: 10.21105/joss.04611</mixed-citation><mixed-citation xml:lang="ru">Overschie J. G. S., Alsahaf A., Azzopardi G. fseval: a benchmarking framework for feature selection and feature ranking algorithms // Journal of Open Source Software. 2022. Vol. 7, N. 79. P. 4611. DOI: 10.21105/joss.04611</mixed-citation></citation-alternatives></ref><ref id="B11"><label>11.</label><citation-alternatives><mixed-citation xml:lang="en">El-Kenawy E. S. M., Eid M. M., Saber M., Ibrahim А. MbGWO-SFS: Modified binary grey wolf optimizer based on stochastic fractal search for feature selection, IEEE Access, 2020, vol. 8, pp. 107635—107649, DOI: 10.1109/ACCESS.2020.3001151</mixed-citation><mixed-citation xml:lang="ru">El-Kenawy E. S. M. et al. MbGWO-SFS: Modified binary grey wolf optimizer based on stochastic fractal search for feature selection //IEEE Access. 2020. Vol. 8. P. 107635—107649. DOI: 10.1109/ACCESS.2020.3001151</mixed-citation></citation-alternatives></ref><ref id="B12"><label>12.</label><citation-alternatives><mixed-citation xml:lang="en">Alhussan A. A. et al. А binary waterwheel plant optimization algorithm for feature selection, IEEE Access, 2023, vol. 11, pp. 94227—94251, DOI: 10.1109/ACCESS.2023.3312022</mixed-citation><mixed-citation xml:lang="ru">Alhussan A. A. et al. А binary waterwheel plant optimization algorithm for feature selection // IEEE Access. 2023. Vol. 11. P. 94227—94251. DOI: 10.1109/ACCESS.2023.3312022</mixed-citation></citation-alternatives></ref><ref id="B13"><label>13.</label><citation-alternatives><mixed-citation xml:lang="en">Li Y., Mansmann U., Du S., Hornung R. Benchmark study of feature selection strategies for multi-omics data, BMC bioinformatics, 2022, vol. 23, no. 1, p. 412, DOI: 10.1186/s12859-022-04962-x</mixed-citation><mixed-citation xml:lang="ru">Li Y. et al. Benchmark study of feature selection strategies for multi-omics data //BMC bioinformatics. 2022. Vol. 23, N. 1. P. 412. DOI: 10.1186/s12859-022-04962-x</mixed-citation></citation-alternatives></ref><ref id="B14"><label>14.</label><citation-alternatives><mixed-citation xml:lang="en">Oreski D., Oreski S., Klicek В. Effects of dataset characteristics on the performance of feature selection techniques, Applied Soft Computing, 2017, vol. 52, pp. 109—119, DOI: 10.1016/j.asoc.2016.12.023</mixed-citation><mixed-citation xml:lang="ru">Oreski D., Oreski S., Klicek В. Effects of dataset characteristics on the performance of feature selection techniques //Applied Soft Computing. 2017. Vol. 52. P. 109—119. DOI: 10.1016/j.asoc.2016.12.023</mixed-citation></citation-alternatives></ref><ref id="B15"><label>15.</label><citation-alternatives><mixed-citation xml:lang="en">Parmezan A. R. S., Lee H. D., Spolaôr N., Wu F. C. Automatic recommendation of feature selection algorithms based on dataset characteristics, Expert Systems with Applications, 2021, vol. 185, p. 115589, DOI: 10.1016/j.eswa.2021.115589</mixed-citation><mixed-citation xml:lang="ru">Parmezan A. R. S. et al. Automatic recommendation of feature selection algorithms based on dataset characteristics //Expert Systems with Applications. 2021. Vol. 185. P. 115589. DOI: 10.1016/j.eswa.2021.115589</mixed-citation></citation-alternatives></ref><ref id="B16"><label>16.</label><citation-alternatives><mixed-citation xml:lang="en">Oliveira A. L. I., Braga P. L., Lima R. M. F., Cornélio M. L. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation, Information and Software Technology, 2010, vol. 52, no. 11, pp. 1155—1166, DOI: 10.1016/j.infsof.2010.05.009</mixed-citation><mixed-citation xml:lang="ru">Oliveira A. L. I. et al. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation // information and Software Technology. 2010. Vol. 52, N. 11. P. 1155—1166. DOI: 10.1016/j.infsof.2010.05.009</mixed-citation></citation-alternatives></ref><ref id="B17"><label>17.</label><citation-alternatives><mixed-citation xml:lang="en">Agrawal P., Abutarboush H. F., Ganesh T., Mohamed А. W. Metaheuristic algorithms on feature selection: А survey of one decade of research (2009-2019), IEEE Access, 2021, vol. 9, pp. 26766—26791, DOI: 10.1109/ACCESS.2021.3056407</mixed-citation><mixed-citation xml:lang="ru">Agrawal P. et al. Metaheuristic algorithms on feature selection: А survey of one decade of research (2009-2019) // Ieee Access. 2021. Vol. 9. P. 26766—26791. DOI: 10.1109/ACCESS.2021.3056407</mixed-citation></citation-alternatives></ref><ref id="B18"><label>18.</label><citation-alternatives><mixed-citation xml:lang="en">Pearson K. Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation, Philosophical Transactions of the Royal Society A., 1901, no. 197, pp. 287—299, DOI: 10.1098/rsta.1901.0023</mixed-citation><mixed-citation xml:lang="ru">Pearson K. Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation // Philosophical Transactions of the Royal Society A. 1901. N. 197. P. 287—299. DOI: 10.1098/rsta.1901.0023</mixed-citation></citation-alternatives></ref><ref id="B19"><label>19.</label><citation-alternatives><mixed-citation xml:lang="en">Endres D. M., Schindelin J. E. A new metric for probability distributions, IEEE Trans. Inf. Theory, 2003, vol. 49, no. 7, pp. 1858—1860, DOI: 10.1109/TIT.2003.813506</mixed-citation><mixed-citation xml:lang="ru">Endres D. M., Schindelin J. E. A new metric for probability distributions // IEEE Trans. Inf. Theory. 2003. Vol. 49, N. 7. P. 1858—1860. DOI: 10.1109/TIT.2003.813506</mixed-citation></citation-alternatives></ref><ref id="B20"><label>20.</label><citation-alternatives><mixed-citation xml:lang="en">Ye K. Q. Orthogonal column Latin hypercubes and their application in computer experiments, Journal of the American Statistical Association, 1998, vol. 93, no. 444, pp. 1430—1439, DOI: 10.1080/01621459.1998.10473803</mixed-citation><mixed-citation xml:lang="ru">Ye K. Q. Orthogonal column Latin hypercubes and their application in computer experiments // Journal of the American Statistical Association. 1998. N. 93 (444). P. 1430—1439. DOI: 10.1080/01621459.1998.10473803</mixed-citation></citation-alternatives></ref><ref id="B21"><label>21.</label><citation-alternatives><mixed-citation xml:lang="en">Mehmood T., Sæbø S., Liland K. H. Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics, 2020, vol. 34, no. 6, p. e3226, DOI: 10.1002/cem.3226</mixed-citation><mixed-citation xml:lang="ru">Mehmood T., Sæbø S., Liland K. H. Comparison of variable selection methods in partial least squares regression // Journal of Chemometrics. 2020. Vol. 34, N. 6. P. e3226. DOI: 10.1002/cem.3226</mixed-citation></citation-alternatives></ref><ref id="B22"><label>22.</label><citation-alternatives><mixed-citation xml:lang="en">Jia P., Wang Y., Du Z., Zhao X., Wang Y., Chen B., Wang W., Guo H., Tang R. Erase: Benchmarking feature selection methods for deep recommender systems, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5194—5205. DOI: 10.1145/3637528.3671571</mixed-citation><mixed-citation xml:lang="ru">Jia P. et al. Erase: Benchmarking feature selection methods for deep recommender systems // Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024. P. 5194—5205. DOI: 10.1145/3637528.3671571</mixed-citation></citation-alternatives></ref><ref id="B23"><label>23.</label><citation-alternatives><mixed-citation xml:lang="en">Karagiannaki K., Panousopoulou A., Tsakalides P. А benchmark study on feature selection for human activity recognition, Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, 2016, pp. 105—108, DOI: 10.1145/2968219.2971421</mixed-citation><mixed-citation xml:lang="ru">Karagiannaki K., Panousopoulou A., Tsakalides P. А benchmark study on feature selection for human activity recognition // Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. 2016. P. 105—108. DOI: 10.1145/2968219.2971421</mixed-citation></citation-alternatives></ref><ref id="B24"><label>24.</label><citation-alternatives><mixed-citation xml:lang="en">Generalov I. G. Cyclical influence of factors on the strategic development of grain production, Vestnik NGIEI, 2024, no. 8 (159), pp. 74—83 (in Russian), DOI: 10.24412/2227-9407-2024-8-74-83</mixed-citation><mixed-citation xml:lang="ru">Генералов И. Г. Циклическое влияние факторов на стратегическое развитие производства зерна // Вестник НГИЭИ. 2024. № 8(159). С. 74—83. DOI: 10.24412/2227-9407-2024-8-74-83</mixed-citation></citation-alternatives></ref><ref id="B25"><label>25.</label><citation-alternatives><mixed-citation xml:lang="en">Kafiev I. R., Romanov P. S., Romanov I. P. On the issue of automation of the process of sorting porcini mushrooms using neural networks, Vestnik NGIEI, 2024, no. 4 (155), pp. 34—49 (in Russian), DOI: 10.24412/2227-9407-2024-4-34-49</mixed-citation><mixed-citation xml:lang="ru">Кафиев И. Р., Романов П. С., Романов И. П. К вопросу автоматизации процесса сортировки белых грибов с использованием нейронных сетей // Вестник НГИЭИ. 2024. № 4 (155). С. 34—49. DOI: 10.24412/2227-9407-2024-4-34-49</mixed-citation></citation-alternatives></ref></ref-list></back></article>
