<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Economics and Mathematical Methods</journal-id><journal-title-group><journal-title xml:lang="en">Economics and Mathematical Methods</journal-title><trans-title-group xml:lang="ru"><trans-title>Экономика и математические методы</trans-title></trans-title-group></journal-title-group><issn publication-format="print">0424-7388</issn><issn publication-format="electronic">3034-6177</issn><publisher><publisher-name xml:lang="en">The Russian Academy of Sciences</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">682162</article-id><article-id pub-id-type="doi">10.31857/S0424738825010072</article-id><article-categories><subj-group subj-group-type="toc-heading"><subject>Проблемы предприятий</subject></subj-group><subj-group subj-group-type="article-type"><subject>Research Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Model for human capital management of an enterprise based on reinforcement learning methods</article-title><trans-title-group xml:lang="ru"><trans-title>Модель управления человеческим капиталом предприятия на основе методов «машинного обучения с подкреплением»</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="en"><surname>Orlova</surname><given-names>E. V.</given-names></name><name xml:lang="ru"><surname>Орлова</surname><given-names>Е. В.</given-names></name></name-alternatives><address><country country="RU">Russian Federation</country></address><email>ekorl@mail.ru</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff-alternatives id="aff1"><aff><institution xml:lang="en">Ufa University of Science and Technology</institution></aff><aff><institution xml:lang="ru">Уфимский университет науки и технологий</institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2025-04-16" publication-format="electronic"><day>16</day><month>04</month><year>2025</year></pub-date><volume>61</volume><issue>1</issue><fpage>70</fpage><lpage>83</lpage><history><date date-type="received" iso-8601-date="2025-06-03"><day>03</day><month>06</month><year>2025</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2025, Russian Academy of Sciences</copyright-statement><copyright-statement xml:lang="ru">Copyright ©; 2025, Российская академия наук</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="en">Russian Academy of Sciences</copyright-holder><copyright-holder xml:lang="ru">Российская академия наук</copyright-holder></permissions><self-uri xlink:href="https://journals.eco-vector.com/0424-7388/article/view/682162">https://journals.eco-vector.com/0424-7388/article/view/682162</self-uri><abstract xml:lang="en"><p>Human capital is an important driver for sustainable enterprise’s economic growth and becomes more important under digital transformation. The employee profile appears multifaceted due to the expansion of activities. Therefore, the problem of human capital management based on the design of employees’ individual trajectories of professional development is relevant, timely, socially and economically significant. The paper proposes a model for employees’ individual trajectories of the professional development, which is based on reinforcement learning methods. The model forms an optimal management regime and is considered as a consistent set of program activities aimed at the employee’s development in his professional sphere. It considers employee’s individual characteristics (health, competencies, motivation and social capital). The total control system is considered as a digital twin of an employee, and creates the environment — the model of an employee as a Markov decision process and the control model — the agent — a center of enterprise’s decision-making. We use reinforcement learning algorithms DDQN, SARSA, PRO to maximize the agent’s utility function. Based on the experiments, it is shown that the best results are provided by the DDQN algorithm. The results generated by the proposed model are of practical importance, which would contribute to the growth of an enterprise’s innovativeness and competitiveness by improving the human capital quality and increasing the labor resource efficiency.</p></abstract><trans-abstract xml:lang="ru"><p>Человеческий капитал является одним из важнейших движущих сил устойчивого экономического роста предприятия, что приобретает еще большую значимость в условиях изменений характера труда в период цифровой трансформации экономики. Портрет работника становится все более многогранным вследствие расширения сфер его активности. Поэтому проблема управления человеческим капиталом на основе формирования индивидуальных траекторий профессионального развития работников представляется актуальной, своевременной, социально и экономически значимой. В работе предлагается модель управления человеческим капиталом, предназначенная для разработки индивидуальных траекторий профессионального развития работников предприятия, формирование которых основано на методах машинного обучения — «машинного обучения с подкреплением». Модель формирует оптимальный режим управления и рассматривается как последовательный набор программных мероприятий, направленных на развитие работника в профессиональной сфере с учетом его изменяющихся в динамике индивидуальных характеристик состояния здоровья, уровня профессиональных и надпрофессиональных компетенций, мотивации, социального капитала. Архитектуру системы управления можно рассматривать как цифровой двойник работника предприятия, который объединяет среду — модель работника как марковского процесса принятия решений и модель управления — агента — центра принятия решений предприятия. Для максимизации функции полезности агента используются алгоритмы «машинного обучения с подкреплением» DQN, DDQN, SARSA, PRO. На основе проведенных экспериментов показано, что наилучшие результаты в смысле достижения максимальной полезности агента обеспечивает алгоритм DDQN. Практическую значимость имеют результаты, сгенерированные предлагаемой моделью, реализация которых позволит в кратчайшие сроки обеспечить рост инновационности и конкурентоспособности предприятия за счет улучшения качества человеческого капитала и роста ресурсной эффективности труда.</p></trans-abstract><kwd-group xml:lang="en"><kwd>human capital</kwd><kwd>employee’s digital twin</kwd><kwd>individual trajectories</kwd><kwd>machine learning</kwd><kwd>reinforcement learning</kwd><kwd>Q-learning</kwd><kwd>optimal control</kwd><kwd>Markov decision process</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>человеческий капитал</kwd><kwd>цифровой двойник работника</kwd><kwd>индивидуальные траектории</kwd><kwd>машинное обучение</kwd><kwd>«машинное обучение с подкреплением»</kwd><kwd>Q-обучение</kwd><kwd>оптимальное управление</kwd><kwd>марковский процесс принятия решений</kwd></kwd-group><funding-group/></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><mixed-citation>Акопов А. С. (2023). Моделирование и оптимизация стратегий принятия индивидуальных решений в многоагентных социально-экономических системах с использованием машинного обучения // Бизнес-информатика. Т. 17. № 2. С. 7–19. DOI: 10.17323/2587-814X.2023.2.7.19 [Akopov A. S. (2023). Modeling and optimization of strategies for making individual decisions in multi-agent socio-economic systems with the use of machine learning. Business Informatics, 17, 2, 7–19. DOI: 10.17323/2587-814X.2023.2.7.19 (in Russian).]</mixed-citation></ref><ref id="B2"><label>2.</label><mixed-citation>Боровков А. И. (2021) Цифровые двойники в условиях четвертой промышленной революции // CONNECT. Мир информационных технологий. № 1–2. С. 50–53. [Borovkov A. I. (2021). Digital twins in the fourth industrial revolution. CONNECT. The World of Information Technologies, 1–2, 50–53 (in Russian).]</mixed-citation></ref><ref id="B3"><label>3.</label><mixed-citation>Макаров В. Л., Бахтизин А. Р., Бекларян Г. Л. (2019). Разработка цифровых двойников для производственных предприятий // Бизнес-информатика. Т. 13. № 4. С. 7–16. DOI 10.17323/1998-0663.2019.4.7.16 [Makarov V. L., Bakhtizin A. R., Beklaryan G. L. (2019). Developing digital twins for production enterprises. Business Informatics, 13, 4, 7–16. DOI: 10.17323/1998-0663.2019.4.7.16 (in Russian).]</mixed-citation></ref><ref id="B4"><label>4.</label><mixed-citation>Макаров В. Л., Бахтизин А. Р., Бекларян Г. Л., Акопов А. С., Ровенская Е. А., Стрелковский Н. В. (2022). Агентное моделирование социально-экономических последствий миграции при государственном регулировании занятости // Экономика и математические методы. Т. 58. № 1. С. 113–130. DOI: 10.31857/S042473880018960-5 [Makarov V. L., Bakhtizin A. R., Beklaryan G. L., Akopov A. S., Rovenskaya E. A., Strelkovsky N. V. (2022). Agent-based modeling of the socio-economic consequences of migration under state regulation of employment. Economics and Mathematical Methods, 58, 1, 113–130. DOI: 10.31857/S042473880018960-5 (in Russian).]</mixed-citation></ref><ref id="B5"><label>5.</label><mixed-citation>Макаров В. Л., Бахтизин А. Р., Бекларян Г. Л., Акопов А. С., Стрелковский Н. В., Ровенская Е. А. (2020). Агентное моделирование популяционной динамики двух взаимодействующих сообществ: мигрантов и коренных жителей // Экономика и математические методы. Т. 56. № 2. С. 5–19. DOI 10.31857/S042473880009217-7 [Makarov V. L., Bakhtizin A. R., Beklaryan G. L., Akopov A. S., Strelkovsky N. V., Rovenskaya E. A. (2020). Agent-based modeling of population dynamics of two interacting communities: Migrants and indigenous residents. Economics and Mathematical Methods, 56, 2, 5–19. DOI: 10.31857/S042473880009217-7 (in Russian).]</mixed-citation></ref><ref id="B6"><label>6.</label><mixed-citation>Макаров В. Л., Клейнер Г. Б. (2007). Микроэкономика знаний. М.: Экономика. 300 с. [Makarov V. L., Kleiner G. B. (2007). Microeconomics of knowledge. Moscow: Economics. 300 p. (in Russian).]</mixed-citation></ref><ref id="B7"><label>7.</label><mixed-citation>Орлова Е. В. (2020а). Методы и модели анализа данных и машинного обучения в задаче управления производительностью труда // Программная инженерия. № 4. С. 219–229. DOI: 10.17587/prin.11.219-229 [Orlova E. V. (2020а). Methods and models of data analysis and machine learning in the problem of labor productivity management. Programmnaya Ingeneria (Software Engineering), 11, 4, 219–229. DOI: 10.17587/prin.11.219-229 (in Russian).]</mixed-citation></ref><ref id="B8"><label>8.</label><mixed-citation>Орлова Е. В. (2020б). Управление производительностью труда с учетом факторов здоровья: технология и модели // Управленец. № 6. С. 57–69. DOI: 10.29141/2218-5003-2020-11-6-5 [Orlova E. V. (2020b). Labour productivity management using health factors: Technique and models. The Manager (Upravlenets), 11, 6, 57–69. DOI: 10.29141/2218-5003-2020-11-6-5 (in Russian).]</mixed-citation></ref><ref id="B9"><label>9.</label><mixed-citation>Орлова Е. В. (2021). Оценка человеческого капитала предприятия и управление им в условиях цифровой трансформации экономики // Journal of Applied Economic Research. Т. 20. № 4. С. 666–700. DOI: 10.15826/vestnik.2021.20.4.026 [Orlova E. V. (2021). Assessment of the human capital of an enterprise and its management in the context of the digital transformation of the economy. Journal of Applied Economic Research, 20, 4, 666–700. DOI: 10.15826/vestnik.2021.20.4.026 (in Russian).]</mixed-citation></ref><ref id="B10"><label>10.</label><mixed-citation>Пономарев Е. С., Оселедец И. В., Чихоцкий А. С. (2019). Использование обучения с подкреплением в задаче алгоритмической торговли // Информационные процессы. Т. 19. № 2. C. 122–131. [Ponomarev E. S., Oseledets I. V., Chihotsky A. S. (2019). Using reinforcement learning in algorithmic trading. Information Processes, 19, 2, 122–131 (in Russian).]</mixed-citation></ref><ref id="B11"><label>11.</label><mixed-citation>Abideen A. Z., Sundram V. P.K., Pyeman J., Othman A. K., Sorooshian S. (2021). Digital twin integrated reinforced learning in supply chain and logistics. Logistics, 5, 84. DOI: 10.3390/logistics5040084</mixed-citation></ref><ref id="B12"><label>12.</label><mixed-citation>Alzyoud A. (2018). The influence of human resource management practices on employee work engagement. Foundations of Management, 10, 251–256. DOI: 10.2478/fman-2018-0019</mixed-citation></ref><ref id="B13"><label>13.</label><mixed-citation>Azhikodan A. R., Bhat A. G., Jadhav M. V. (2019). Stock trading bot using deep reinforcement learning. In: Innovations in computer science and engineering. Springer: Berlin/Heidelberg, Germany, 41–49.</mixed-citation></ref><ref id="B14"><label>14.</label><mixed-citation>Chi M., VanLehn K., Litman D. et al. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Model User-Adapted Interaction, 21, 137–180. DOI: 10.1007/s11257-010-9093-1</mixed-citation></ref><ref id="B15"><label>15.</label><mixed-citation>Church A. H., Bracken D. W., Fleeno J. W., Rose D. S. (2019). Handbook of strategic 360 feedback. New York: Oxford University Press. 637 p.</mixed-citation></ref><ref id="B16"><label>16.</label><mixed-citation>Ding Q., Jahanshahi H., Wang Y., Bekiros S., Alassafi M. O. (2022). Optimal reinforcement learning-based control algorithm for a class of nonlinear macroeconomic systems. Mathematics, 10, 499. DOI: 10.3390/math10030499</mixed-citation></ref><ref id="B17"><label>17.</label><mixed-citation>Granovetter M. S. (1973). The strength of weak ties. American Journal of Psychology, 78 (6), 1360–1380.</mixed-citation></ref><ref id="B18"><label>18.</label><mixed-citation>Hernaus T., Pavlovic D., Klindzic M. (2019). Organizational career management practices: The role of the relationship between HRM and trade unions. Employee Relations, 41, 84–100. DOI: 10.1108/ER-02-2018-0035</mixed-citation></ref><ref id="B19"><label>19.</label><mixed-citation>Hitka M., Kucharčíková A., Štarchoň P., Balážová Ž., Lukáč M., Stacho Z. (2019). Knowledge and human capital as sustainable competitive advantage in human resource management. Sustainability, 11, 4985. DOI: 10.3390/su11184985</mixed-citation></ref><ref id="B20"><label>20.</label><mixed-citation>Jung Y., Takeuchi N. (2018). A lifespan perspective for understanding career self-management and satisfaction: The role of developmental human resource practices and organizational support. Human Relations, 7, 73–102.</mixed-citation></ref><ref id="B21"><label>21.</label><mixed-citation>Li Q., Lin T., Yu Q., Du H., Li J., Fu X. (2023). Review of deep reinforcement learning and its application in modern renewable power system control. Energies, 16, 4143. DOI: 10.3390/en16104143</mixed-citation></ref><ref id="B22"><label>22.</label><mixed-citation>Liu J., Zhang Y., Wang X., Deng Y., Wu X. (2019). Dynamic pricing on e-commerce platform with deep reinforcement learning. arXiv:1912.02572.</mixed-citation></ref><ref id="B23"><label>23.</label><mixed-citation>Mohammadi М., Al-Fuqaha А. Guizani М., Oh J. (2018). Semisupervised deep reinforcement leaming in support of loT and smart city services. IEEE Internet of Things Journal, 5, 2, 624–635.</mixed-citation></ref><ref id="B24"><label>24.</label><mixed-citation>Orlova E. V. (2021a). Innovation in company labor productivity management: Data science methods application. Applied System Innovation, 4, 3, 68. DOI: 10.3390/ asi4030068</mixed-citation></ref><ref id="B25"><label>25.</label><mixed-citation>Orlova E. V. (2021b). Design of personal trajectories for employees’ professional development in the knowledge society under industry 5.0. Social Sciences, 10, 11, 427. DOI: 10.3390/socsci10110427</mixed-citation></ref><ref id="B26"><label>26.</label><mixed-citation>Orlova E. V. (2022). Design technology and ai-based decision making model for digital twin engineering. Future Internet, 14, 9, 248. DOI: 10.3390/fi14090248</mixed-citation></ref><ref id="B27"><label>27.</label><mixed-citation>Orlova E. V. (2023). Inference of factors for labor productivity growth used randomized experiment and statistical causality. Mathematics, 11, 4, 863. DOI: 10.3390/math11040863</mixed-citation></ref><ref id="B28"><label>28.</label><mixed-citation>Orr J., Dutta A. (2023). Multi-agent deep reinforcement learning for multi-robot applications: A survey. Sensors, 23, 3625. DOI: 10.3390/s23073625</mixed-citation></ref><ref id="B29"><label>29.</label><mixed-citation>Osranek R., Zink K. J. (2014). Corporate human capital and social sustainability of human resources. In: I. Ehnert, W. Harry, K. Zink. Sustainability and human resource management. CSR, Sustainability, Ethics &amp; Governance. Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-37524-8_5</mixed-citation></ref><ref id="B30"><label>30.</label><mixed-citation>Rachid В., Mohamed T., Khouaja M. A. (2018). An agent based modeling approach in the strategic human resource. Management, including endogenous and exogenous factors. Simulation Modelling Practice and Theory, 88, 32–47.</mixed-citation></ref><ref id="B31"><label>31.</label><mixed-citation>Schelling T. C. (1971). Dynamic models of segregation. The Journal of Mathematical Sociology, (Informa UK Limited), 1 (2), 143–186. DOI:10.1080/0022250x.1971.9989794</mixed-citation></ref><ref id="B32"><label>32.</label><mixed-citation>Steelman L. A., Williams J. R. (2019). Feedback at work. Switzerland AG: Springer Nature. 280 p.</mixed-citation></ref><ref id="B33"><label>33.</label><mixed-citation>Stokowski S., Li B., Goss B. D., Hutchens S., Turk M. (2018). Work motivation and job satisfaction of sport management faculty members. Sport Management Education Journal, 12, 80–89. DOI: 10.1123/smej.2017-0011</mixed-citation></ref><ref id="B34"><label>34.</label><mixed-citation>Wang R., Chen Z., Xing Q., Zhang Z., Zhang T. (2022). A modified rainbow-based deep reinforcement learning method for optimal scheduling of charging station. Sustainability, 14, 1884. DOI: 10.3390/su14031884</mixed-citation></ref><ref id="B35"><label>35.</label><mixed-citation>Yan Y., Chow A. H., Ho C. P., Kuo Y. H., Wu Q., Ying C. (2022). Reinforcement learning forlogistics and supply chain management: Methodologies, state of the art, and future opportunities. Transportation Research Part E: Logistics and Transportation Review, 162, 102712.</mixed-citation></ref><ref id="B36"><label>36.</label><mixed-citation>Yu C., Liu J., Nemati S. (2019a). Reinforcement learning in healthcare: A survey. arXiv:1908.08796.</mixed-citation></ref><ref id="B37"><label>37.</label><mixed-citation>Yu P., Lee J. S., Kulyatin I., Shi Z., Dasgupta S. (2019b). Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv:1901.08740.</mixed-citation></ref><ref id="B38"><label>38.</label><mixed-citation>Zhang L., Guo X., Lei Z., Lim M. K. (2019). Social network analysis of sustainable human resource management from the employee training’s perspective. Sustainability, 11, 380. DOI: 10.3390/su11020380</mixed-citation></ref><ref id="B39"><label>39.</label><mixed-citation>Zheng G., Zhang F., Zheng Z., Xiang Y., Yuan N. J., Xie X., Li Z. (2018). DRN: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference. Lyon, France, 167–176.</mixed-citation></ref></ref-list></back></article>
