Monitoring fault tolerance in distributed systems
- 作者: Sukhoplyuev D.I.1, Nazarov A.N.2
-
隶属关系:
- MIREA – Russian Technological University
- Federal Research Center Computer Science and Control of Russian Academy of Sciences
- 期: 卷 11, 编号 4 (2024)
- 页面: 94-106
- 栏目: Mathematical and software of computеrs, complexes and computer networks
- URL: https://journals.eco-vector.com/2313-223X/article/view/659213
- DOI: https://doi.org/10.33693/2313-223X-2024-11-4-94-106
- EDN: https://elibrary.ru/GHFEEC
- ID: 659213
如何引用文章
详细
The goal of this study is to develop and verify a monitoring model for reliability and availability in distributed systems, built on probabilistic component characteristics and accounting for dependent failures. Modern distributed systems require accurate failure prediction methods that can account for complex dependencies between nodes and support reliable performance under high loads. Traditional approaches based on empirical data analysis often fall short in predicting system states under changing loads, which limits their applicability. In this research, the developed probabilistic model underwent verification using numerical simulation and accuracy assessment through Kullback–Leibler divergence and mean squared error (MSE), confirming its accuracy and practical value. The model’s versatility was proven experimentally, demonstrating its ability to adapt to various types of distributed systems while providing precise real-time predictions of availability and resilience. Numerical experiments showed that the proposed model can be a reliable tool for managing fault tolerance and load balancing. Thus, the developed model is an effective solution for enhancing the reliability of distributed systems, exhibiting a high degree of versatility and making it valuable for a wide range of applications.
全文:

作者简介
Danil Sukhoplyuev
MIREA – Russian Technological University
编辑信件的主要联系方式.
Email: sukhoplyuev.d.i@edu.mirea.ru
SPIN 代码: 3931-0217
postgraduate student
俄罗斯联邦, MoscowAlexey Nazarov
Federal Research Center Computer Science and Control of Russian Academy of Sciences
Email: a.nazarov06@bk.ru
ORCID iD: 0000-0002-0497-0296
SPIN 代码: 6032-5302
Dr. Sci. (Eng.), Professor
俄罗斯联邦, Moscow参考
- Yermagambetov R.T., Kiselev E.S. Modern big data storage and processing systems: Hadoop and apache spark. Forum of Young Scientists. 2018. No. 8 (24). Pp. 229–239. (In Rus.). EDN: VLYZSA.
- Dzidzava E.T., Akhmedov K.M. Big data and HADOOP: Review report. Bulletin of the Magistracy. 2021. No. 1-1 (112). Pp. 30–32. (In Rus.). EDN: SCTUXC.
- Nekratyuk A.A., Safaryan O.A. Using the MAPREDUCE method in BIG DATA. Young Researcher of the Don. 2020. No. 3 (24). Pp. 174–179. (In Rus.) EDN” WJCAAM.
- Tatarnikova T.M., Arkhiptsev E.D., Karmanovsky N.S. Determining the cluster size and number of replicas for high-load information systems. Izvestiya Vysshikh Uchebnykh Zavedeniy. Instrument Making. 2023. Vol. 66. No. 8. Pp. 646–651. (In Rus.). doi: 10.17586/0021-3454-2023-66-8-646-651. EDN: GHKBJE.
- Copik M., Calotoiu A., Pengyu Zhou et al. FaaSKeeper: Learning from building serverless services with zookeeper as an example. In: HPDC’24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing. NY.: Association for Computing Machinery, 2024. Pp. 94–108. doi: 10.1145/3625549.3658661.
- Grigoryan T.G. Fault-tolerant systems and methods for achieving them. Scientific Aspect. 2024. Vol. 26. No. 7. Pp. 3264–3268. (In Rus.). EDN: EBZTNN.
- Lubkov N.V., Stepanyants A.S., Viktorova V.S. Reliability models and analysis of protected systems. Automation and Remote Control. 2018. No. 7. Pp. 117–137. (In Rus.). doi: 10.31857/S000523100000271-2. EDN: YALAPB.
- Fokin A.B. Method for calculating connectivity probabilities (readiness coefficients) in a telecommunications network supporting fault-tolerant mechanisms. Information Systems and Technologies. 2023. No. 4 (138). Pp. 83–91. (In Rus.). EDN: CWQJBV.
- Aglianò P., Ugolini S. Structural and universal completeness in algebra and logic. doi: 10.48550/arXiv.2309.14151. URL: https://arxiv.org/abs/2309.14151
- Lemeshko B.Yu. Problems of Applying Non-Parametric Goodness-of-Fit Tests in Measurement Processing Tasks / B.Yu. Lemeshko, S.B. Lemeshko // Systems of Analysis and Data Processing. 2021. No. 2(82). P. 47-66. doi: 10.17212/2782-2001-2021-2-47-66. EDN WJARCI.
- Khatskevich V.L. On some extreme properties of means and mathematical expectations of random variables. Bulletin of Voronezh State Technical University. 2013. Vol. 9. No. 3-1. Pp. 39–44. (In Rus.). EDN: QCQYVZ.
- Gafarova L.M., Zavyalova I.G., Mustafin N.N. On the features of using the pearson χ2 goodness-of-fit test. Economic and Socio-Humanitarian Research. 2015. No. 4 (8). Pp. 63–67. (In Rus.). EDN: VEIMQN.
- Golovkina A.G., Kozyuchenko V.A., Klimenko I.S. Successive approximation method for building a dynamic polynomial regression model. Bulletin of St. Petersburg University. Applied Mathematics. Informatics. Control Processes. 2022. Vol. 18. No. 4. Pp. 487–500. (In Rus.). doi: 10.21638/11701/spbu10.2022.404. EDN QXVJIL.
- Sukhoplyuev D.I., Nazarov A.N. Analysis of application-level load balancing algorithms. In: Systems of signals generating and processing in the field of on-board communications. Moscow, 2023. Pp. 1–4. doi: 10.1109/IEEECONF56737.2023.10092019.
- Alfara A.Yu.A., Korolev D.V., Zaitsev K.S., Dunaev M.E. Development of a monitoring system for a server application. International Journal of Open Information Technologies. 2023. Vol. 11. No. 8. Pp. 24–31. (In Rus.). EDN: OCTBSB.
补充文件
