The Effectiveness of the A2C Algorithm in Relation to Classical Models of the Theory of Economic Growth

Alexander M. Moiseenko; Моисеенко Александр Максимович; Natalia V. Grineva; Гринева Наталья Владимировна

doi:10.33693/2313-223X-2024-11-1-68-77

The Effectiveness of the A2C Algorithm in Relation to Classical Models of the Theory of Economic Growth

Autores: Moiseenko A.M.¹, Grineva N.V.²
Afiliações:
1. Russian Academy of National Economy and Public Administration under the President of the Russian Federation
2. Financial University under the Government of the Russian Federation
Edição: Volume 11, Nº 1 (2024)
Páginas: 68-77
Seção: SYSTEM ANALYSIS, INFORMATION MANAGEMENT AND PROCESSING, STATISTICS
URL: https://journals.eco-vector.com/2313-223X/article/view/631145
DOI: https://doi.org/10.33693/2313-223X-2024-11-1-68-77
ID: 631145

Citar

Texto integral

Acesso aberto
Acesso é fechado

Acesso está concedido
Acesso é fechado

Acesso é pago ou somente para assinantes

Resumo
Texto integral
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

The relevance of the study is to identify the accuracy of the estimate obtained by the A2C algorithm, as well as the need for verification of reinforcement learning when working with optimization of economic processes. The purpose of the study was to analyze the effectiveness of the A2C algorithm, along with the specifics of its implementation, in solving optimization economic problems. The tasks considered were maximizing consumption in the Solow, Romer and Schumpeterian models of endogenous economic growth, and maximizing per capita income in the latter two, according to the consumption rate (in the latter two – saving rate) and the share of scientists in the economy, respectively. The results showed that for deterministic models (Solow model, Romer model), the variance of the parameter estimate is minimal and the average differs from the value obtained analytically by no more than a thousandth part with a sufficiently high number of time periods in the model. Nevertheless, in stochastic models (the Schumpeterian model), firstly, a high number of time periods in the model is required to match the estimate to the value obtained analytically, and secondly, the estimate obtained in this way, although biased by no more than a thousandth of a fraction, has a high variance.

Palavras-chave

reinforcement learning, macroeconomic modeling, Solow model, Romer model, Schumpeterian model of endogenous economic growth, optimization of macroeconomic processes, theory of economic growth

Texto integral

Sobre autores

Alexander Moiseenko

Russian Academy of National Economy and Public Administration under the President of the Russian Federation

Autor responsável pela correspondência
Email: alex7and7er@gmail.com
ORCID ID: 0009-0001-0380-1693

1st year graduate student, Department of System Analysis

Rússia, Moscow

Natalia Grineva

Financial University under the Government of the Russian Federation

Email: ngrineva@fa.ru
ORCID ID: 0000-0001-7647-5967

Cand. Sci. (Econ.), Associate Professor, associate professor, Department of Data Analysis and Machine Learning

Rússia, Moscow

Bibliografia

Aghion P., Howitt P. A model of growth through creative destruction. 1990.
Atashbar T., Aruhan Shi R. AI and macroeconomic modeling: Deep reinforcement learning in an RBC model. 2023.
Kakade S.M. A natural policy gradient. In: Advances in neural information processing systems. 2001. Vol. 14.
Mnih V. et al. Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR, 2016. Pp. 1928–1937.
Peters J., Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks. 2008. Vol. 21. No. 4. Pp. 682–697.
Romer P.M. Endogenous technological change. Journal of Political Economy. 1990. Vol. 98. No. 5. Part 2. Pp. S71–S102.
Solow R.M. A contribution to the theory of economic growth. The Quarterly Journal of Economics. 1956. Vol. 70. No. 1. Pp. 65–94.
Zheng S. et al. The ai economist: Improving equality and productivity with AI-driven tax policies // arXiv preprint arXiv:2004.13332. 2020.
Didenko D.V., Grineva N.V. Factors of economic growth in the late USSR in a spatial perspective. Economic Policy. 2022. Vol. 17. No. 2. (In Rus.) Pp. 88–119. EDN: MBEJDX. doi: 10.18288/1994-5124-2022-2-88-119.
Grineva N.V. Assessment of intellectual capital during the transition to a digital economy. Problems of Economics and Legal Practice. 2022. Vol. 18. No. 2. Pp. 219–227. (In Rus.) EDN: CGWWNJ.
Krinichansky K., Grineva N. Dynamic approach to the analysis of financial structure: Overcoming the bank-based vs market-based dichotomy. In: 16th International Conference Management of large-scale system development (MLSD). 2023. No. 16. EDN: RSHSND. doi: 10.1109/MLSD58227.2023.10303933.

Arquivos suplementares

Ação

1. JATS XML

Baixar

2. Fig. 1. A vanilla policy gradient (a) considers a change in all parameters as equally distant, thus, it is a search for a maximum on a circle while the natural gradient (b) uses scales determined by the Fisher information which results in a reduction in exploration. The slower reduction in exploration results into a faster convergence to the optimal policy [5]

Baixar (101KB)

Metadados

3. Fig. 2. The dynamics of consumption rate values as the model is trained over 30 000 epochs

Baixar (142KB)

Metadados

4. Fig. 3. The optimal value of the consumption rate: a – is the dependence of the consumption rate c on the total number of time periods T (logarithmic scale); b – is the dependence of the total reward on the consumption rate for the total number of time periods T at the level of 100

Baixar (24KB)

Metadados

5. Fig. 4. Dynamics of optimized parameters during neural network training over 60 000 epochs: а – the spread of the savings rate values as the model is trained for T = 100; b – the spread of the values of the norm of scientists in economics as the model is trained for T = 100

Baixar (94KB)

Metadados

6. Fig. 5. Dynamics of optimized parameters during neural network training over 60 000 epochs: а – the spread of the savings rate values as the model is trained for T = 1000; b – the spread of the values of the norm of scientists in economics as the model is trained for T = 1000.

Baixar (116KB)

Metadados

7. Fig. 6. The dependence of the total reward on the share of scientists in the economy at the current initialization of the model

Baixar (13KB)

Metadados

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro