The Effectiveness of the A2C Algorithm in Relation to Classical Models of the Theory of Economic Growth

封面

如何引用文章

全文:

开放存取 开放存取
受限制的访问 ##reader.subscriptionAccessGranted##
受限制的访问 订阅存取

详细

The relevance of the study is to identify the accuracy of the estimate obtained by the A2C algorithm, as well as the need for verification of reinforcement learning when working with optimization of economic processes. The purpose of the study was to analyze the effectiveness of the A2C algorithm, along with the specifics of its implementation, in solving optimization economic problems. The tasks considered were maximizing consumption in the Solow, Romer and Schumpeterian models of endogenous economic growth, and maximizing per capita income in the latter two, according to the consumption rate (in the latter two – saving rate) and the share of scientists in the economy, respectively. The results showed that for deterministic models (Solow model, Romer model), the variance of the parameter estimate is minimal and the average differs from the value obtained analytically by no more than a thousandth part with a sufficiently high number of time periods in the model. Nevertheless, in stochastic models (the Schumpeterian model), firstly, a high number of time periods in the model is required to match the estimate to the value obtained analytically, and secondly, the estimate obtained in this way, although biased by no more than a thousandth of a fraction, has a high variance.

全文:

受限制的访问

作者简介

Alexander Moiseenko

Russian Academy of National Economy and Public Administration under the President of the Russian Federation

编辑信件的主要联系方式.
Email: alex7and7er@gmail.com
ORCID iD: 0009-0001-0380-1693

1st year graduate student, Department of System Analysis

俄罗斯联邦, Moscow

Natalia Grineva

Financial University under the Government of the Russian Federation

Email: ngrineva@fa.ru
ORCID iD: 0000-0001-7647-5967

Cand. Sci. (Econ.), Associate Professor, associate professor, Department of Data Analysis and Machine Learning

俄罗斯联邦, Moscow

参考

  1. Aghion P., Howitt P. A model of growth through creative destruction. 1990.
  2. Atashbar T., Aruhan Shi R. AI and macroeconomic modeling: Deep reinforcement learning in an RBC model. 2023.
  3. Kakade S.M. A natural policy gradient. In: Advances in neural information processing systems. 2001. Vol. 14.
  4. Mnih V. et al. Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR, 2016. Pp. 1928–1937.
  5. Peters J., Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks. 2008. Vol. 21. No. 4. Pp. 682–697.
  6. Romer P.M. Endogenous technological change. Journal of Political Economy. 1990. Vol. 98. No. 5. Part 2. Pp. S71–S102.
  7. Solow R.M. A contribution to the theory of economic growth. The Quarterly Journal of Economics. 1956. Vol. 70. No. 1. Pp. 65–94.
  8. Zheng S. et al. The ai economist: Improving equality and productivity with AI-driven tax policies // arXiv preprint arXiv:2004.13332. 2020.
  9. Didenko D.V., Grineva N.V. Factors of economic growth in the late USSR in a spatial perspective. Economic Policy. 2022. Vol. 17. No. 2. (In Rus.) Pp. 88–119. EDN: MBEJDX. doi: 10.18288/1994-5124-2022-2-88-119.
  10. Grineva N.V. Assessment of intellectual capital during the transition to a digital economy. Problems of Economics and Legal Practice. 2022. Vol. 18. No. 2. Pp. 219–227. (In Rus.) EDN: CGWWNJ.
  11. Krinichansky K., Grineva N. Dynamic approach to the analysis of financial structure: Overcoming the bank-based vs market-based dichotomy. In: 16th International Conference Management of large-scale system development (MLSD). 2023. No. 16. EDN: RSHSND. doi: 10.1109/MLSD58227.2023.10303933.

补充文件

附件文件
动作
1. JATS XML
2. Fig. 1. A vanilla policy gradient (a) considers a change in all parameters as equally distant, thus, it is a search for a maximum on a circle while the natural gradient (b) uses scales determined by the Fisher information which results in a reduction in exploration. The slower reduction in exploration results into a faster convergence to the optimal policy [5]

下载 (101KB)
3. Fig. 2. The dynamics of consumption rate values as the model is trained over 30 000 epochs

下载 (142KB)
4. Fig. 3. The optimal value of the consumption rate: a – is the dependence of the consumption rate c on the total number of time periods T (logarithmic scale); b – is the dependence of the total reward on the consumption rate for the total number of time periods T at the level of 100

下载 (24KB)
5. Fig. 4. Dynamics of optimized parameters during neural network training over 60 000 epochs: а – the spread of the savings rate values as the model is trained for T = 100; b – the spread of the values of the norm of scientists in economics as the model is trained for T = 100

下载 (94KB)
6. Fig. 5. Dynamics of optimized parameters during neural network training over 60 000 epochs: а – the spread of the savings rate values as the model is trained for T = 1000; b – the spread of the values of the norm of scientists in economics as the model is trained for T = 1000.

下载 (116KB)
7. Fig. 6. The dependence of the total reward on the share of scientists in the economy at the current initialization of the model

下载 (13KB)


##common.cookie##