Reducing the dimensionality of data for analysis using the principal component method

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription or Fee Access

Abstract

Despite the fact that modern data mining systems have high computing power, the amount of data to analyze is constantly increasing and can become a critical factor. Thus, the task of reducing the dimensionality of the source data for analysis without reducing the quality of the analysis itself becomes relevant. One of the methods that allows you to reduce the dimensionality of the data is the principal component method. The paper considers the application of this method in data analysis in sensor network nodes. The advantage of the method is that there are no preliminary hypotheses about the condition of the object under study. The implementation of the method is linear and cyclic, which determines its good algorithmization by computer technology. As the initial data set, a set of wireless sensor network operation data is used, which consists of one thousand nodes. For each node, a selection of measurements on the main parameters of the quality of service is presented. The initial data is being preprocessed. A covariance matrix is constructed for which the eigenvalues and eigenvectors are found. The result of the method is the main components obtained by converting the eigenvectors. These components are used for data analysis. The result of this work is a reduction in the dimension of the data.

Full Text

Restricted Access

About the authors

Denis V. Gadasin

Moscow Technical University of Communications and Informatics (MTUCI)

Author for correspondence.
Email: dengadiplom@mail.ru
ORCID iD: 0000-0002-5601-7798
Scopus Author ID: 57203169517
ResearcherId: AAT-5451-2021

Cand. Sci. (Eng.), Associate Professor, deputy head, Department of Network Information Technologies and Services

Russian Federation, Moscow

Lilia A. Tremasova

Moscow Technical University of Communications and Informatics (MTUCI)

Email: l.a.tremasova@mtuci.ru
ORCID iD: 0009-0004-6852-4131

Assistant professor, Department of Network Information Technologies and Services

Russian Federation, Moscow

Daniil S. Kalininsky

Moscow Technical University of Communications and Informatics (MTUCI)

Email: daniilblag28@mail.ru
ORCID iD: 0009-0005-3069-9566
ResearcherId: KIB-6020-2024

postgraduate student, Department of Network Information Technologies and Services

Russian Federation, Moscow

References

  1. Ayvazyan A.S., Bukhstaber V.M., Yenyukov I.S., Meshalkin L.D. Applied statistics: Classification and dimensionality reduction. Moscow: Finance and Statistics, 1989. 609 p.
  2. Markova S.V. Data analysis in the R language (with a practical guide). Textbook. Moscow: KnoRus, 2023. 218 p. (Bachelor's degree). ISBN: 978-5-406-10865-9. EDN: BXJBWF.
  3. Krivolapov S.Ya. Introduction to data analysis. Searching for a data structure using Python. Textbook. Moscow: INFRA-M, 2024. 177 p. ISBN: 978-5-16-019001-3. doi: 10.12737/2082643. EDN: ZMJBEO.
  4. Kuzmenko A.A., Leonov Yu.A., Martynenko A.A. et al. Introduction to data analysis in Python. Kursk: University Book, 2024. 125 p. ISBN: 978-5-907857-52-0. EDN: PBTQKJ.
  5. Kozyrev S.V., Polyantseva K.A. Complex analysis and comparison of advanced algorithms for pavement defects using various data collection systems. Inzhenernyj vestnik Dona. 2024. No. 11 (119). Pp. 72–116. (In Rus.). EDN: JHKKTB.
  6. Polyantseva K.A. Development of algorithms for data accumulation through stereo pairs and detection of defects in the roadway. Modern High-tech Technologies. 2022. No. 5-1. Pp. 107–112. (In Rus.). doi: 10.17513/snt.39156. EDN: EDAZTV.
  7. Parshintseva L.S., Parshintsev A.A. Multidimensional data analysis in Python. Textbook. Moscow: KnoRus, 2024. 130 p. ISBN: 978-5-406-12606-6. EDN: JOPYQS.
  8. Tsarkova E.G. Data mining. Moscow: Znanie-M, 2024. 144 p. ISBN: 978-5-00187-862-9. doi: 10.38006/00187-862-9.2024.1.144. EDN: CEPBUD.
  9. Ivanov S.A., Kolmogorova S.S. Intellectual data analysis: theoretical and practical aspects of application. St. Petersburg: Renome, 2024. 142 p. ISBN: 978-5-00256-023-3. doi: 10.25990/spbgltu.e567-2739. EDN: BOZUEK.
  10. Gadasin D.V., Shvedov A.V. Application of the transport task for load balancing in conditions of source data fuzziness. T-Comm: Telecommunications and Transport. 2024. Vol. 18. No. 1. Pp. 13–20. (In Rus.). doi: 10.36724/2072-8735-2024-18-1-13-20. EDN: WKNPIX.
  11. Gadasin D.V. Building a binary minimum price tree. T-Comm: Telecommunications and Transport. 2024. Vol. 18. No. 11. Pp. 38–44. (In Rus.). doi: 10.36724/2072-8735-2024-18-11-38-44. EDN: GMCEWG.
  12. Gadasin D.V., Shvedov A.V., Kuzin I.A. Three-dimensional reconstruction of an object from a single image using deep convolutional neural networks. T-Comm: Telecommunications and Transport. 2022. Vol. 16. No. 7. Pp. 29–35. (In Rus.). doi: 10.36724/2072-8735-2022-16-7-29-35. EDN: YTLCNW.
  13. Tarasova I.A., Simonova I.E., Simonov A.B. Econometric data analysis. Textbook. Volgograd: Volgograd State Technical University, 2023. 96 p. ISBN: 978-5-9948-4709-1. EDN: ALTGJI.
  14. Mkhitaryan V.S., Shishov V.F., Iskorkin D.V., Kozlov A.Yu. Probabilistic and statistical data analysis using MS Excel. In: Economics and economic sciences. Part 1: Probabilistic methods of data analysis. Moscow: KURS, 2023. 360 p. ISBN: 978-5-907535-92-3. EDN: HXJIEI.
  15. Ponomareva L.A., Golosov P.E., Mosyagia A.B. et al. Data analysis in managerial decision-making. Moscow: OntoPrint, 2021. 183 p. ISBN: 978-5-00121-379-6. EDN: BRCMFY.

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 14. Analysis of the results in terms of the coordinates of the two main components

Download (130KB)

Copyright (c) 2025 Yur-VAK

License URL: https://www.urvak.ru/contacts/