<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Informacionnye Tehnologii</journal-id><journal-title-group><journal-title xml:lang="en">Informacionnye Tehnologii</journal-title><trans-title-group xml:lang="ru"><trans-title>Информационные технологии</trans-title></trans-title-group></journal-title-group><issn publication-format="print">1684-6400</issn><publisher><publisher-name xml:lang="en">New Technologies Publishing House</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">702341</article-id><article-id pub-id-type="doi">10.17587/it.32.28-36</article-id><article-categories><subj-group subj-group-type="toc-heading" xml:lang="en"><subject>Intelligent systems and technologies</subject></subj-group><subj-group subj-group-type="toc-heading" xml:lang="ru"><subject>Интеллектуальные системы и технологии</subject></subj-group><subj-group subj-group-type="article-type"><subject>Research Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Extraction of physical and technical information from text documents</article-title><trans-title-group xml:lang="ru"><trans-title>Извлечение физико-технической информации из текстовых документов</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="en"><surname>Korobkin</surname><given-names>D. M.</given-names></name><name xml:lang="ru"><surname>Коробкин</surname><given-names>Д. М.</given-names></name></name-alternatives><address><country country="RU">Russian Federation</country></address><bio xml:lang="en"><p>Ph.D., Assistant Professor</p></bio><bio xml:lang="ru"><p>канд. техн. наук, доц.</p></bio><email>dkorobkin80@mail.ru</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff-alternatives id="aff1"><aff><institution xml:lang="en">Volgograd State Technical University</institution></aff><aff><institution xml:lang="ru">Волгоградский государственный технический университет</institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2026-01-15" publication-format="electronic"><day>15</day><month>01</month><year>2026</year></pub-date><volume>32</volume><issue>1</issue><issue-title xml:lang="en"/><issue-title xml:lang="ru"/><fpage>28</fpage><lpage>36</lpage><history><date date-type="received" iso-8601-date="2026-02-08"><day>08</day><month>02</month><year>2026</year></date><date date-type="accepted" iso-8601-date="2026-02-08"><day>08</day><month>02</month><year>2026</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2026, Informacionnye Tehnologii</copyright-statement><copyright-statement xml:lang="ru">Copyright ©; 2026, Информационные технологии</copyright-statement><copyright-year>2026</copyright-year><copyright-holder xml:lang="en">Informacionnye Tehnologii</copyright-holder><copyright-holder xml:lang="ru">Информационные технологии</copyright-holder></permissions><self-uri xlink:href="https://journals.eco-vector.com/1684-6400/article/view/702341">https://journals.eco-vector.com/1684-6400/article/view/702341</self-uri><abstract xml:lang="en"><p>The relevance of the study is due to the need to automate the analysis of text documents containing descriptions of physical and technical effects. In the context of modern development of science and technology, the volume of scientific articles, patent documents and grant reports is rapidly increasing, which requires effective methods for extracting and analyzing such key data. The theoretical significance of the work lies in the development of a new method for automatic extraction of physical and technical data in the form of keyphrases from natural-language text documents, ensuring the cooperation of deep learning technologies and methods of semantic-ontological text analysis. The practical significance of the work lies in the creation of a software for automatic extraction of elements of physical and technical effects from natural-language texts. The corpora of sentences (more than 4.3 thousand) was formed from the texts of patents containing physical and technical structured information in the form of descriptions of physical effects, solved technical problems. Neural network models keyT5, T5 and Bert were trained to extract physical and technical information. The T5 and KeyT5 models demonstrated high results in extracting keyphrases in the form of elements of descriptions of physical and technical effects (precision over 0.94, recall over 0.95).</p></abstract><trans-abstract xml:lang="ru"><p>Актуальность работы обусловлена необходимостью автоматизации анализа текстовых документов, содержащих описания физико-технических эффектов. В условиях современного развития науки и техники объем научных статей, патентных документов и грантовых отчетов стремительно увеличивается, что требует эффективных методов для извлечения и анализа подобных ключевых данных. Теоретическая значимость работы заключается в разработке нового метода автоматического извлечения физико-технических данных в виде ключевых фраз из естественно-языковых текстовых документов, обеспечивающего кооперацию технологий глубокого обучения и методов семантико-онтологического анализа текста. Практическая значимость работы заключается в создании системы для автоматического извлечения элементов, описывающих физико-технические эффекты, из естественно-языковых текстов. Сформирован размеченный корпус предложений (более 4,3 тыс.) из текстов патентов, содержащих физико-техническую структурированную информацию в виде описаний физических эффектов, решаемых технических проблем. Проведено обучение нейросетевых моделей keyT5, T5 и Bert для излечения физико-технической информации. Модели T5 и KeyT5 продемонстрировали высокие результаты в извлечении ключевых фраз в виде элементов описаний физико-технических эффектов (точность более 0,94, полнота более 0,95).</p></trans-abstract><kwd-group xml:lang="en"><kwd>technical effects</kwd><kwd>physical effects</kwd><kwd>information extraction</kwd><kwd>dataset</kwd><kwd>keyT5</kwd><kwd>T5</kwd><kwd>Bert</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>физико-технические эффекты</kwd><kwd>извлечение информации</kwd><kwd>датасет</kwd><kwd>keyT5</kwd><kwd>T5</kwd><kwd>Bert</kwd></kwd-group><funding-group><award-group><funding-source><institution-wrap><institution xml:lang="ru">Российский научный фонд</institution></institution-wrap><institution-wrap><institution xml:lang="en">Russian Science Foundation</institution></institution-wrap></funding-source><award-id>24-21-20140</award-id></award-group><award-group><funding-source><institution-wrap><institution xml:lang="ru">Администрация Волгоградской области</institution></institution-wrap><institution-wrap><institution xml:lang="en">Administration of Volgograd Region</institution></institution-wrap></funding-source></award-group><funding-statement xml:lang="en">The study was supported by the grant of Russian Science Foundation No. 24-21-20140, https://rscf.ru/project/24-21-20140/, and Administration of Volgograd Region.</funding-statement><funding-statement xml:lang="ru">Исследование выполнено за счет гранта Российского научного фонда № 24-21-20140, https://rscf.ru/project/24-21-20140/, и Администрации Волгоградской области.</funding-statement></funding-group></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><citation-alternatives><mixed-citation xml:lang="en">Korobkin D. M., Fomenkov S. A., Davydova S. V. Search of Physical Effect descriptions in global patent space, Vestnik komp’iuternykh i informatsionnykh tekhnologii, 2016, no. 5, pp. 3—11 (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Коробкин Д. М., Фоменков С. А., Давыдова С. В. Поиск описаний физических эффектов в патентном массиве // Вестник компьютерных и информационных технологий. 2016. № 5. C. 3—11. DOI: 10.14489/vkit.2016.05.pp.003-011.</mixed-citation></citation-alternatives></ref><ref id="B2"><label>2.</label><citation-alternatives><mixed-citation xml:lang="en">Korobkin D. M., Shabanov D. V., Fomenkov S. A., Dvorjankin А. M. The software for formation the matrix of technical functions performed by physical effects based on patent database analysis, Modeling, optimization and information technology, 2020, vol. 8, no. 4 (31), 12 p., DOI: 10.26102/2310-6018/2020.31.4.006. (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Коробкин Д. М., Шабанов Д. В., Фоменков С. А., Дворянкин А. М. Система формирования матрицы выполняемых физическими эффектами технических функций на основе анализа патентного массива // Моделирование, оптимизация и информационные технологии: сетевой научный журнал. 2020. Т. 8, № 4 (31). 12 с. DOI: 10.26102/2310-6018/2020.31.4.006.</mixed-citation></citation-alternatives></ref><ref id="B3"><label>3.</label><citation-alternatives><mixed-citation xml:lang="en">Mamedov V. Y., Kovalevsky D. A., Morozov D. A., Stolyarov S. S., Ospichev S. S. Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example), Modeling and Analysis of Information Systems, 2025, vol. 32, no. 1, pp. 80—94, DOI 10.18255/1818-1015-2025-1-80-94 (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Мамедов В. Ю., Ковалевский Д. А., Морозов Д. А., Столяров С. С., Оспичев С. С. Иерархическая классификация научных статей при помощи глубокого обучения (на примере иерархии УДК) // Моделирование и анализ информационных систем. 2025. Т. 32, № 1. С. 80—94. DOI: 10.18255/1818-1015-2025-1-80-94.</mixed-citation></citation-alternatives></ref><ref id="B4"><label>4.</label><citation-alternatives><mixed-citation xml:lang="en">Kusakin I. K., Fedorets O. V., Romanov А. Y. Classification of Short Scientific Texts, Scientific and Technical Information Processing, 2023, vol. 50, no. 3, pp. 176—183, DOI 10.36535/0548-0019-2023-07-3.</mixed-citation><mixed-citation xml:lang="ru">Кусакин И. К., Федорец О. В., Романов А. Ю. О классификации коротких научных текстов // Научно-техническая информация. Серия 1: Организация и методика информационной работы. 2023. № 7. С. 22—28. DOI 10.36535/0548-0019-2023-07-3.</mixed-citation></citation-alternatives></ref><ref id="B5"><label>5.</label><citation-alternatives><mixed-citation xml:lang="en">Fedotova A., Kurtukova A., Romanov A., Shelupanov А. Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution, In IEEE Access, 2024, vol. 12, pp. 39783—39803, DOI: 10.1109/ACCESS.2024.3377231.</mixed-citation><mixed-citation xml:lang="ru">Fedotova A., Kurtukova A., Romanov A., Shelupanov А. Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution // In IEEE Access. 2024. Vol. 12. P. 39783—39803. DOI: 10.1109/ACCESS.2024.3377231.</mixed-citation></citation-alternatives></ref><ref id="B6"><label>6.</label><citation-alternatives><mixed-citation xml:lang="en">Marshalova A. E., Bruches E. P., Batura T. V. Aspect extraction from scientific paper texts, Software &amp; Systems, 2022, no 4 (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Маршалова А. Э., Бручес Е. П., Батура Т. В. Извлечение аспектов из текстов научных статей // Программные продукты и системы. 2022. № 4.</mixed-citation></citation-alternatives></ref><ref id="B7"><label>7.</label><citation-alternatives><mixed-citation xml:lang="en">Vasiliev D. D., Pyataeva А. V. T5 language models for text simplification, Software &amp; Systems, 2023, no. 2 (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Васильев Д. Д., Пятаева А. В. Использование языковых моделей T5 для задачи упрощения текста // Программные продукты и системы. 2023. № 2.</mixed-citation></citation-alternatives></ref><ref id="B8"><label>8.</label><citation-alternatives><mixed-citation xml:lang="en">Ermolenko T. V. Classification of Errors in the Text Based on Deep Learning, Problems of artificial intelligence, 2019, no. 3 (14) (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Ермоленко Т. В. Классификация ошибок в тексте на основе глубокого обучения // Проблемы искусственного интеллекта. 2019. № 3 (14).</mixed-citation></citation-alternatives></ref><ref id="B9"><label>9.</label><citation-alternatives><mixed-citation xml:lang="en">Gabín J., Ares M., Parapar J. Enhancing Automatic Keyphrase Labelling with Text-to-Text Transfer Transformer (T5) Architecture: А Framework for Keyphrase Generation and Filtering, Advances in Information Retrieval: 46th European Conference on IR Research (ECIR 2024), Glasgow, UK, 2024, pp. 267—275, DOI: 10.1007/978-3-031-56027-9_18.</mixed-citation><mixed-citation xml:lang="ru">Gabín J., Ares M., Parapar J. Enhancing Automatic Keyphrase Labelling with Text-to-Text Transfer Transformer (T5) Architecture: А Framework for Keyphrase Generation and Filtering // Advances in Information Retrieval: 46th European Conference on IR Research (ECIR 2024). Glasgow, UK, 2024. P. 267—275. DOI: 10.1007/978-3-031-56027-9_18.</mixed-citation></citation-alternatives></ref><ref id="B10"><label>10.</label><citation-alternatives><mixed-citation xml:lang="en">Chopra S., Agarwal P., Ahmed J., Biswas S., Obaid А. Roberta and BERT: Revolutionizing Mental Healthcare Through Natural Language, SN Computer Science, 2024, no. 5, DOI: 10.1007/s42979-024-03202-8.</mixed-citation><mixed-citation xml:lang="ru">Chopra S., Agarwal P., Ahmed J., Biswas S., Obaid А. Roberta and BERT: Revolutionizing Mental Healthcare Through Natural Language // SN Computer Science. 2024. N. 5. DOI: 10.1007/s42979-024-03202-8.</mixed-citation></citation-alternatives></ref><ref id="B11"><label>11.</label><citation-alternatives><mixed-citation xml:lang="en">Huang Q. Research on Keywords Extraction of Film Reviews Based on the KeyBERT Model. Transactions on Computer Science and Intelligent Systems Research, 2024, no. 5, pp. 732—738, DOI: 10.62051/1zpndy68.</mixed-citation><mixed-citation xml:lang="ru">Huang Q. Research on Keywords Extraction of Film Reviews Based on the KeyBERT Model // Transactions on Computer Science and Intelligent Systems Research. 2024. N. 5. P. 732—738. DOI: 10.62051/1zpndy68.</mixed-citation></citation-alternatives></ref><ref id="B12"><label>12.</label><citation-alternatives><mixed-citation xml:lang="en">Campos R., Mangaravite V., Pasquali A., Jorge A., Nunes C., Jatowt A. YAKE! Collection-Independent Automatic Keyword Extractor, Proceedings of the 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, 2018, pp. 806—810, DOI: 10.1007/978-3-319-76941-7_80.</mixed-citation><mixed-citation xml:lang="ru">Campos R., Mangaravite V., Pasquali A., Jorge A., Nunes C., Jatowt A. YAKE! Collection-Independent Automatic Keyword Extractor // Proceedings of the 40th European Conference on Information Retrieval (ECIR 2018). Grenoble, France, 2018. P. 806—810. DOI: 10.1007/978-3-319-76941-7_80</mixed-citation></citation-alternatives></ref><ref id="B13"><label>13.</label><citation-alternatives><mixed-citation xml:lang="en">Ivaschenko A., Stolbova A., Krupin D., Krivosheev A., Sitnikov P., Kravets O. Semantic analysis implementation in engineering enterprise content management systems, Proceedings of the 2023 IEEE 17th International Conference on Application of Information and Communication Technologies (AICT), Moscow, 2023, pp. 1—5, DOI: 10.1109/AICT59525.2023.10313055.</mixed-citation><mixed-citation xml:lang="ru">Ivaschenko A., Stolbova A., Krupin D., Krivosheev A., Sitnikov P., Kravets O. Semantic analysis implementation in engineering enterprise content management systems // Proceedings of the 2023 IEEE 17th International Conference on Application of Information and Communication Technologies (AICT). Москва, 2023. С. 1—5. DOI: 10.1109/AICT59525.2023.10313055.</mixed-citation></citation-alternatives></ref><ref id="B14"><label>14.</label><citation-alternatives><mixed-citation xml:lang="en">Surdeanu M., Valenzuela-Escárcega M. A. Using Transformers with the Hugging Face Library, Deep Learning for Natural Language Processing: А Gentle Introduction, Ed. by M. Surdeanu, M. A. Valenzuela-Escárcega, Cambridge, Cambridge University Press, 2024, pp. 194—215, DOI: 10.1017/9781009026222.014.</mixed-citation><mixed-citation xml:lang="ru">Surdeanu M., Valenzuela-Escárcega M. Using Transformers with the Hugging Face Library // Deep Learning for Natural Language Processing: А Gentle Introduction / Ed. by M. Surdeanu, M. A. Valenzuela-Escárcega. Cambridge: Cambridge University Press, 2024. P. 194—215. DOI: 10.1017/9781009026222.014.</mixed-citation></citation-alternatives></ref><ref id="B15"><label>15.</label><citation-alternatives><mixed-citation xml:lang="en">Goloviznina V. S., Kotelnikov E. V. Automatic Summarization of Russian Texts: Comparison of Extractive and Abstractive Methods, Computational Linguistics and Intellectual Technologies, 2022, pp. 223—235.</mixed-citation><mixed-citation xml:lang="ru">Goloviznina V. S., Kotelnikov E. V. Automatic Summarization of Russian Texts: Comparison of Extractive and Abstractive Methods // Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference "Dialogue". Moscow: Russian State University for the Humanities, 2022. P. 223—235.</mixed-citation></citation-alternatives></ref><ref id="B16"><label>16.</label><citation-alternatives><mixed-citation xml:lang="en">Schmitt X., Kubler S., Robert J., Papadakis M., LeTraon Y. А Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate, 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 2019, pp. 338—343, DOI: 10.1109/SNAMS.2019.8931850.</mixed-citation><mixed-citation xml:lang="ru">Schmitt X., Kubler S., Robert J., Papadakis M., LeTraon Y. А Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate // 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). Granada, Spain, 2019. P. 338—343. DOI: 10.1109/SNAMS.2019.8931850.</mixed-citation></citation-alternatives></ref><ref id="B17"><label>17.</label><citation-alternatives><mixed-citation xml:lang="en">Ovchinnikova K. A., Ivanov A. I., Sidorova E. A. Information extraction from texts based on ontology and large language models, System Informatics, 2023, no. 23, pp. 13—32, DOI 10.31144/SI.2307-6410 (in Russian).</mixed-citation><mixed-citation xml:lang="ru">Овчинникова К. А., Иванов А. И., Сидорова Е. А. Автоматизация построения терминологического ядра онтологии по компьютерной лингвистике на основе корпуса текстов // Системная информатика. 2023. № 23. С. 13—32. DOI 10.31144/SI.2307-6410.</mixed-citation></citation-alternatives></ref><ref id="B18"><label>18.</label><citation-alternatives><mixed-citation xml:lang="en">Goutte C., Gaussier E. А Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation, Advances in Information Retrieval: 27th European Conference on Information Retrieval Research (ECIR 2005), Santiago de Compostela, Spain, 2005, pp. 345—359, DOI: 10.1007/978-3-540-31865-1_25.</mixed-citation><mixed-citation xml:lang="ru">Goutte C., Gaussier E. А Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation // Advances in Information Retrieval: 27th European Conference on Information Retrieval Research (ECIR 2005). Santiago de Compostela, Spain, 2005. P. 345—359. DOI: 10.1007/978-3-540-31865-1_25.</mixed-citation></citation-alternatives></ref></ref-list></back></article>
