Monitoring fault tolerance in distributed systems

Мұқаба

Дәйексөз келтіру

Толық мәтін

Ашық рұқсат Ашық рұқсат
Рұқсат жабық Рұқсат берілді
Рұқсат жабық Рұқсат ақылы немесе тек жазылушылар үшін

Аннотация

The goal of this study is to develop and verify a monitoring model for reliability and availability in distributed systems, built on probabilistic component characteristics and accounting for dependent failures. Modern distributed systems require accurate failure prediction methods that can account for complex dependencies between nodes and support reliable performance under high loads. Traditional approaches based on empirical data analysis often fall short in predicting system states under changing loads, which limits their applicability. In this research, the developed probabilistic model underwent verification using numerical simulation and accuracy assessment through Kullback–Leibler divergence and mean squared error (MSE), confirming its accuracy and practical value. The model’s versatility was proven experimentally, demonstrating its ability to adapt to various types of distributed systems while providing precise real-time predictions of availability and resilience. Numerical experiments showed that the proposed model can be a reliable tool for managing fault tolerance and load balancing. Thus, the developed model is an effective solution for enhancing the reliability of distributed systems, exhibiting a high degree of versatility and making it valuable for a wide range of applications.

Толық мәтін

Рұқсат жабық

Авторлар туралы

Danil Sukhoplyuev

MIREA – Russian Technological University

Хат алмасуға жауапты Автор.
Email: sukhoplyuev.d.i@edu.mirea.ru
SPIN-код: 3931-0217

postgraduate student

Ресей, Moscow

Alexey Nazarov

Federal Research Center Computer Science and Control of Russian Academy of Sciences

Email: a.nazarov06@bk.ru
ORCID iD: 0000-0002-0497-0296
SPIN-код: 6032-5302

Dr. Sci. (Eng.), Professor

Ресей, Moscow

Әдебиет тізімі

  1. Yermagambetov R.T., Kiselev E.S. Modern big data storage and processing systems: Hadoop and apache spark. Forum of Young Scientists. 2018. No. 8 (24). Pp. 229–239. (In Rus.). EDN: VLYZSA.
  2. Dzidzava E.T., Akhmedov K.M. Big data and HADOOP: Review report. Bulletin of the Magistracy. 2021. No. 1-1 (112). Pp. 30–32. (In Rus.). EDN: SCTUXC.
  3. Nekratyuk A.A., Safaryan O.A. Using the MAPREDUCE method in BIG DATA. Young Researcher of the Don. 2020. No. 3 (24). Pp. 174–179. (In Rus.) EDN” WJCAAM.
  4. Tatarnikova T.M., Arkhiptsev E.D., Karmanovsky N.S. Determining the cluster size and number of replicas for high-load information systems. Izvestiya Vysshikh Uchebnykh Zavedeniy. Instrument Making. 2023. Vol. 66. No. 8. Pp. 646–651. (In Rus.). doi: 10.17586/0021-3454-2023-66-8-646-651. EDN: GHKBJE.
  5. Copik M., Calotoiu A., Pengyu Zhou et al. FaaSKeeper: Learning from building serverless services with zookeeper as an example. In: HPDC’24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing. NY.: Association for Computing Machinery, 2024. Pp. 94–108. doi: 10.1145/3625549.3658661.
  6. Grigoryan T.G. Fault-tolerant systems and methods for achieving them. Scientific Aspect. 2024. Vol. 26. No. 7. Pp. 3264–3268. (In Rus.). EDN: EBZTNN.
  7. Lubkov N.V., Stepanyants A.S., Viktorova V.S. Reliability models and analysis of protected systems. Automation and Remote Control. 2018. No. 7. Pp. 117–137. (In Rus.). doi: 10.31857/S000523100000271-2. EDN: YALAPB.
  8. Fokin A.B. Method for calculating connectivity probabilities (readiness coefficients) in a telecommunications network supporting fault-tolerant mechanisms. Information Systems and Technologies. 2023. No. 4 (138). Pp. 83–91. (In Rus.). EDN: CWQJBV.
  9. Aglianò P., Ugolini S. Structural and universal completeness in algebra and logic. doi: 10.48550/arXiv.2309.14151. URL: https://arxiv.org/abs/2309.14151
  10. Lemeshko B.Yu. Problems of Applying Non-Parametric Goodness-of-Fit Tests in Measurement Processing Tasks / B.Yu. Lemeshko, S.B. Lemeshko // Systems of Analysis and Data Processing. 2021. No. 2(82). P. 47-66. doi: 10.17212/2782-2001-2021-2-47-66. EDN WJARCI.
  11. Khatskevich V.L. On some extreme properties of means and mathematical expectations of random variables. Bulletin of Voronezh State Technical University. 2013. Vol. 9. No. 3-1. Pp. 39–44. (In Rus.). EDN: QCQYVZ.
  12. Gafarova L.M., Zavyalova I.G., Mustafin N.N. On the features of using the pearson χ2 goodness-of-fit test. Economic and Socio-Humanitarian Research. 2015. No. 4 (8). Pp. 63–67. (In Rus.). EDN: VEIMQN.
  13. Golovkina A.G., Kozyuchenko V.A., Klimenko I.S. Successive approximation method for building a dynamic polynomial regression model. Bulletin of St. Petersburg University. Applied Mathematics. Informatics. Control Processes. 2022. Vol. 18. No. 4. Pp. 487–500. (In Rus.). doi: 10.21638/11701/spbu10.2022.404. EDN QXVJIL.
  14. Sukhoplyuev D.I., Nazarov A.N. Analysis of application-level load balancing algorithms. In: Systems of signals generating and processing in the field of on-board communications. Moscow, 2023. Pp. 1–4. doi: 10.1109/IEEECONF56737.2023.10092019.
  15. Alfara A.Yu.A., Korolev D.V., Zaitsev K.S., Dunaev M.E. Development of a monitoring system for a server application. International Journal of Open Information Technologies. 2023. Vol. 11. No. 8. Pp. 24–31. (In Rus.). EDN: OCTBSB.

Қосымша файлдар

Қосымша файлдар
Әрекет
1. JATS XML
2. Fig. 1. Formal architecture of NameNode – DataNode in Apache Hadoop (Source: https://www.analyticsvidhya.com/blog/2022/05/workings-of-hadoop-distributed-file-system-hdfs/)

Жүктеу (203KB)
3. Fig. 2. Nginx load balancer (Source: https://coderpad.io/blog/development/how-to-configure-different-load-balancing-algorithms-on-nginx/)

Жүктеу (216KB)