The Effectiveness of the A2C Algorithm in Relation to Classical Models of the Theory of Economic Growth

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The relevance of the study is to identify the accuracy of the estimate obtained by the A2C algorithm, as well as the need for verification of reinforcement learning when working with optimization of economic processes. The purpose of the study was to analyze the effectiveness of the A2C algorithm, along with the specifics of its implementation, in solving optimization economic problems. The tasks considered were maximizing consumption in the Solow, Romer and Schumpeterian models of endogenous economic growth, and maximizing per capita income in the latter two, according to the consumption rate (in the latter two – saving rate) and the share of scientists in the economy, respectively. The results showed that for deterministic models (Solow model, Romer model), the variance of the parameter estimate is minimal and the average differs from the value obtained analytically by no more than a thousandth part with a sufficiently high number of time periods in the model. Nevertheless, in stochastic models (the Schumpeterian model), firstly, a high number of time periods in the model is required to match the estimate to the value obtained analytically, and secondly, the estimate obtained in this way, although biased by no more than a thousandth of a fraction, has a high variance.

Full Text

Restricted Access

About the authors

Alexander M. Moiseenko

Russian Academy of National Economy and Public Administration under the President of the Russian Federation

Author for correspondence.
Email: alex7and7er@gmail.com
ORCID iD: 0009-0001-0380-1693

1st year graduate student, Department of System Analysis

Russian Federation, Moscow

Natalia V. Grineva

Financial University under the Government of the Russian Federation

Email: ngrineva@fa.ru
ORCID iD: 0000-0001-7647-5967

Cand. Sci. (Econ.), Associate Professor, associate professor, Department of Data Analysis and Machine Learning

Russian Federation, Moscow

References

  1. Aghion P., Howitt P. A model of growth through creative destruction. 1990.
  2. Atashbar T., Aruhan Shi R. AI and macroeconomic modeling: Deep reinforcement learning in an RBC model. 2023.
  3. Kakade S.M. A natural policy gradient. In: Advances in neural information processing systems. 2001. Vol. 14.
  4. Mnih V. et al. Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR, 2016. Pp. 1928–1937.
  5. Peters J., Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks. 2008. Vol. 21. No. 4. Pp. 682–697.
  6. Romer P.M. Endogenous technological change. Journal of Political Economy. 1990. Vol. 98. No. 5. Part 2. Pp. S71–S102.
  7. Solow R.M. A contribution to the theory of economic growth. The Quarterly Journal of Economics. 1956. Vol. 70. No. 1. Pp. 65–94.
  8. Zheng S. et al. The ai economist: Improving equality and productivity with AI-driven tax policies // arXiv preprint arXiv:2004.13332. 2020.
  9. Didenko D.V., Grineva N.V. Factors of economic growth in the late USSR in a spatial perspective. Economic Policy. 2022. Vol. 17. No. 2. (In Rus.) Pp. 88–119. EDN: MBEJDX. doi: 10.18288/1994-5124-2022-2-88-119.
  10. Grineva N.V. Assessment of intellectual capital during the transition to a digital economy. Problems of Economics and Legal Practice. 2022. Vol. 18. No. 2. Pp. 219–227. (In Rus.) EDN: CGWWNJ.
  11. Krinichansky K., Grineva N. Dynamic approach to the analysis of financial structure: Overcoming the bank-based vs market-based dichotomy. In: 16th International Conference Management of large-scale system development (MLSD). 2023. No. 16. EDN: RSHSND. doi: 10.1109/MLSD58227.2023.10303933.

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. A vanilla policy gradient (a) considers a change in all parameters as equally distant, thus, it is a search for a maximum on a circle while the natural gradient (b) uses scales determined by the Fisher information which results in a reduction in exploration. The slower reduction in exploration results into a faster convergence to the optimal policy [5]

Download (101KB)
3. Fig. 2. The dynamics of consumption rate values as the model is trained over 30 000 epochs

Download (142KB)
4. Fig. 3. The optimal value of the consumption rate: a – is the dependence of the consumption rate c on the total number of time periods T (logarithmic scale); b – is the dependence of the total reward on the consumption rate for the total number of time periods T at the level of 100

Download (24KB)
5. Fig. 4. Dynamics of optimized parameters during neural network training over 60 000 epochs: а – the spread of the savings rate values as the model is trained for T = 100; b – the spread of the values of the norm of scientists in economics as the model is trained for T = 100

Download (94KB)
6. Fig. 5. Dynamics of optimized parameters during neural network training over 60 000 epochs: а – the spread of the savings rate values as the model is trained for T = 1000; b – the spread of the values of the norm of scientists in economics as the model is trained for T = 1000.

Download (116KB)
7. Fig. 6. The dependence of the total reward on the share of scientists in the economy at the current initialization of the model

Download (13KB)


This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies