Extraction of physical and technical information from text documents
- Authors: Korobkin D.M.1
-
Affiliations:
- Volgograd State Technical University
- Issue: Vol 32, No 1 (2026)
- Pages: 28-36
- Section: Intelligent systems and technologies
- Published: 15.01.2026
- URL: https://journals.eco-vector.com/1684-6400/article/view/702341
- DOI: https://doi.org/10.17587/it.32.28-36
- ID: 702341
Cite item
Abstract
The relevance of the study is due to the need to automate the analysis of text documents containing descriptions of physical and technical effects. In the context of modern development of science and technology, the volume of scientific articles, patent documents and grant reports is rapidly increasing, which requires effective methods for extracting and analyzing such key data. The theoretical significance of the work lies in the development of a new method for automatic extraction of physical and technical data in the form of keyphrases from natural-language text documents, ensuring the cooperation of deep learning technologies and methods of semantic-ontological text analysis. The practical significance of the work lies in the creation of a software for automatic extraction of elements of physical and technical effects from natural-language texts. The corpora of sentences (more than 4.3 thousand) was formed from the texts of patents containing physical and technical structured information in the form of descriptions of physical effects, solved technical problems. Neural network models keyT5, T5 and Bert were trained to extract physical and technical information. The T5 and KeyT5 models demonstrated high results in extracting keyphrases in the form of elements of descriptions of physical and technical effects (precision over 0.94, recall over 0.95).
Keywords
Full Text
About the authors
D. M. Korobkin
Volgograd State Technical University
Author for correspondence.
Email: dkorobkin80@mail.ru
Ph.D., Assistant Professor
Russian Federation, VolgogradReferences
- Korobkin D. M., Fomenkov S. A., Davydova S. V. Search of Physical Effect descriptions in global patent space, Vestnik komp’iuternykh i informatsionnykh tekhnologii, 2016, no. 5, pp. 3—11 (in Russian).
- Korobkin D. M., Shabanov D. V., Fomenkov S. A., Dvorjankin А. M. The software for formation the matrix of technical functions performed by physical effects based on patent database analysis, Modeling, optimization and information technology, 2020, vol. 8, no. 4 (31), 12 p., doi: 10.26102/2310-6018/2020.31.4.006. (in Russian).
- Mamedov V. Y., Kovalevsky D. A., Morozov D. A., Stolyarov S. S., Ospichev S. S. Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example), Modeling and Analysis of Information Systems, 2025, vol. 32, no. 1, pp. 80—94, doi: 10.18255/1818-1015-2025-1-80-94 (in Russian).
- Kusakin I. K., Fedorets O. V., Romanov А. Y. Classification of Short Scientific Texts, Scientific and Technical Information Processing, 2023, vol. 50, no. 3, pp. 176—183, doi: 10.36535/0548-0019-2023-07-3.
- Fedotova A., Kurtukova A., Romanov A., Shelupanov А. Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution, In IEEE Access, 2024, vol. 12, pp. 39783—39803, doi: 10.1109/ACCESS.2024.3377231.
- Marshalova A. E., Bruches E. P., Batura T. V. Aspect extraction from scientific paper texts, Software & Systems, 2022, no 4 (in Russian).
- Vasiliev D. D., Pyataeva А. V. T5 language models for text simplification, Software & Systems, 2023, no. 2 (in Russian).
- Ermolenko T. V. Classification of Errors in the Text Based on Deep Learning, Problems of artificial intelligence, 2019, no. 3 (14) (in Russian).
- Gabín J., Ares M., Parapar J. Enhancing Automatic Keyphrase Labelling with Text-to-Text Transfer Transformer (T5) Architecture: А Framework for Keyphrase Generation and Filtering, Advances in Information Retrieval: 46th European Conference on IR Research (ECIR 2024), Glasgow, UK, 2024, pp. 267—275, doi: 10.1007/978-3-031-56027-9_18.
- Chopra S., Agarwal P., Ahmed J., Biswas S., Obaid А. Roberta and BERT: Revolutionizing Mental Healthcare Through Natural Language, SN Computer Science, 2024, no. 5, doi: 10.1007/s42979-024-03202-8.
- Huang Q. Research on Keywords Extraction of Film Reviews Based on the KeyBERT Model. Transactions on Computer Science and Intelligent Systems Research, 2024, no. 5, pp. 732—738, doi: 10.62051/1zpndy68.
- Campos R., Mangaravite V., Pasquali A., Jorge A., Nunes C., Jatowt A. YAKE! Collection-Independent Automatic Keyword Extractor, Proceedings of the 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, 2018, pp. 806—810, doi: 10.1007/978-3-319-76941-7_80.
- Ivaschenko A., Stolbova A., Krupin D., Krivosheev A., Sitnikov P., Kravets O. Semantic analysis implementation in engineering enterprise content management systems, Proceedings of the 2023 IEEE 17th International Conference on Application of Information and Communication Technologies (AICT), Moscow, 2023, pp. 1—5, doi: 10.1109/AICT59525.2023.10313055.
- Surdeanu M., Valenzuela-Escárcega M. A. Using Transformers with the Hugging Face Library, Deep Learning for Natural Language Processing: А Gentle Introduction, Ed. by M. Surdeanu, M. A. Valenzuela-Escárcega, Cambridge, Cambridge University Press, 2024, pp. 194—215, doi: 10.1017/9781009026222.014.
- Goloviznina V. S., Kotelnikov E. V. Automatic Summarization of Russian Texts: Comparison of Extractive and Abstractive Methods, Computational Linguistics and Intellectual Technologies, 2022, pp. 223—235.
- Schmitt X., Kubler S., Robert J., Papadakis M., LeTraon Y. А Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate, 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 2019, pp. 338—343, doi: 10.1109/SNAMS.2019.8931850.
- Ovchinnikova K. A., Ivanov A. I., Sidorova E. A. Information extraction from texts based on ontology and large language models, System Informatics, 2023, no. 23, pp. 13—32, doi: 10.31144/SI.2307-6410 (in Russian).
- Goutte C., Gaussier E. А Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation, Advances in Information Retrieval: 27th European Conference on Information Retrieval Research (ECIR 2005), Santiago de Compostela, Spain, 2005, pp. 345—359, doi: 10.1007/978-3-540-31865-1_25.
Supplementary files






