<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Informacionnye Tehnologii</journal-id><journal-title-group><journal-title xml:lang="en">Informacionnye Tehnologii</journal-title><trans-title-group xml:lang="ru"><trans-title>Информационные технологии</trans-title></trans-title-group></journal-title-group><issn publication-format="print">1684-6400</issn><publisher><publisher-name xml:lang="en">New Technologies Publishing House</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">702442</article-id><article-id pub-id-type="doi">10.17587/it.30.622-632</article-id><article-categories><subj-group subj-group-type="toc-heading" xml:lang="en"><subject>Intelligent systems and technologies</subject></subj-group><subj-group subj-group-type="toc-heading" xml:lang="ru"><subject>Интеллектуальные системы и технологии</subject></subj-group><subj-group subj-group-type="article-type"><subject>Research Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Tokenization of political texts in BERT models using ICF<sup>+</sup> ontologies</article-title><trans-title-group xml:lang="ru"><trans-title>Токенизация политических текстов в BERT-моделях с использованием ICF<sup>+</sup>-онтологий</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-1694-7410</contrib-id><name-alternatives><name xml:lang="en"><surname>Kashirin</surname><given-names>I. Yu.</given-names></name><name xml:lang="ru"><surname>Каширин</surname><given-names>И. Ю.</given-names></name></name-alternatives><address><country country="RU">Russian Federation</country></address><bio xml:lang="en"><p>Dr. Tech. Sc., Professor</p></bio><bio xml:lang="ru"><p>д-р техн. наук, проф.</p></bio><email>igor-kashirin@mail.ru</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff-alternatives id="aff1"><aff><institution xml:lang="en">Ryazan State Radio Engineering University named after V. F. Utkin</institution></aff><aff><institution xml:lang="ru">Рязанский государственный радиотехнический университет имени В. Ф. Уткина</institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2024-12-15" publication-format="electronic"><day>15</day><month>12</month><year>2024</year></pub-date><volume>30</volume><issue>12</issue><issue-title xml:lang="en"/><issue-title xml:lang="ru"/><fpage>622</fpage><lpage>632</lpage><history><date date-type="received" iso-8601-date="2026-02-09"><day>09</day><month>02</month><year>2026</year></date><date date-type="accepted" iso-8601-date="2026-02-09"><day>09</day><month>02</month><year>2026</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2024, Informacionnye Tehnologii</copyright-statement><copyright-statement xml:lang="ru">Copyright ©; 2024, Информационные технологии</copyright-statement><copyright-year>2024</copyright-year><copyright-holder xml:lang="en">Informacionnye Tehnologii</copyright-holder><copyright-holder xml:lang="ru">Информационные технологии</copyright-holder></permissions><self-uri xlink:href="https://journals.eco-vector.com/1684-6400/article/view/702442">https://journals.eco-vector.com/1684-6400/article/view/702442</self-uri><abstract xml:lang="en"><p>The design of machine learning language models, as well as their ensembles, used in complex analytics of news texts of domestic and Western electronic media is considered. An example of software implementation of a new language neural network model with problem-oriented ontological tokenization is given. The language used as tools is Python v.3.10, Anaconda v.2.1. The effectiveness of the approach in comparison with the best foreign analogues is confirmed by a series of experiments using the example of classifying news articles according to their ideological orientation into Western and English-language Russian ones.</p></abstract><trans-abstract xml:lang="ru"><p>Рассматривается проектирование языковых моделей машинного обучения, а также их ансамблей, применяемых в сложной аналитике новостных текстов отечественных и западных электронных средств массовой информации. Приводится пример программной реализации новой языковой нейросетевой модели с проблемноориентированной онтологической токенизацией. В качестве инструментария используется язык Python v.3.10, Anaconda v.2.1. Эффективность подхода в сравнении с лучшими зарубежными аналогами подтверждается серией экспериментов на примере классификации новостных статей по их идеологической направленности на западные и англоязычные российские.</p></trans-abstract><kwd-group xml:lang="en"><kwd>Bert models</kwd><kwd>ontological models</kwd><kwd>ICF + relation</kwd><kwd>tokenizer</kwd><kwd>retriever</kwd><kwd>political news</kwd><kwd>ensembles of ML models</kwd><kwd>forecasting</kwd><kwd>semantic similarity</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>Bert-модели</kwd><kwd>онтологические модели</kwd><kwd>ICF +-отношение</kwd><kwd>токенайзер</kwd><kwd>ретривер</kwd><kwd>политические новости</kwd><kwd>ансамбли ML-моделей</kwd><kwd>прогнозирование</kwd><kwd>семантическое сходство</kwd></kwd-group><funding-group/></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><citation-alternatives><mixed-citation xml:lang="en">Anastas’yev A. A., Astashkin M. S., Agafonov P. A., Kashirin I. Yu. Determining the reliability of news using knowledge-based ML models, IIASU’23 — Artificial intelligence in management, control, and data processing systems. Proceedings of the II All-Russian scientific conference (Moscow, April 27—28, 2023), 2023, vol. 2, pp. 21—27.</mixed-citation><mixed-citation xml:lang="ru">Анастасьев А. А., Асташкин М. С., Агафонов П. А., Каширин И. Ю. Определение достоверности новостей с использованием Ml-моделей, основанных на знаниях // IIA-SU’23 — Artificial intelligence in management, control, and data processing systems. Proceedings of the II All-Russian scientific conference (Moscow, April 27—28, 2023). 2023. Vol. 2. Р. 21—27.</mixed-citation></citation-alternatives></ref><ref id="B2"><label>2.</label><citation-alternatives><mixed-citation xml:lang="en">Platonov Ye. N., Rudenko V. Yu. Identification and classification of toxic statements by machine learning methods, Data modeling and analysis, 2022, vol. 12, no. 1, pp. 27—48.</mixed-citation><mixed-citation xml:lang="ru">Платонов Е. Н., Руденко В. Ю. Выявление и классификация токсичных высказываний методами машинного обучения // Моделирование и анализ данных. 2022. Т. 12, № 1. C.27—48.</mixed-citation></citation-alternatives></ref><ref id="B3"><label>3.</label><citation-alternatives><mixed-citation xml:lang="en">Badjatiya P., Gupta S., Gupta M., Varma V. Deep learning for hate speech detection in tweets, Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 759—760.</mixed-citation><mixed-citation xml:lang="ru">Badjatiya P., Gupta S., Gupta M., Varma V. Deep learning for hate speech detection in tweets // Proceedings of the 26th International Conference on World Wide Web Companion. 2017. P. 759—760.</mixed-citation></citation-alternatives></ref><ref id="B4"><label>4.</label><citation-alternatives><mixed-citation xml:lang="en">Agrawal A., An A. Affective representations for sarcasm detection, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 1029—1032.</mixed-citation><mixed-citation xml:lang="ru">Agrawal A., An A. Affective representations for sarcasm detection // 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2018. P. 1029—1032.</mixed-citation></citation-alternatives></ref><ref id="B5"><label>5.</label><citation-alternatives><mixed-citation xml:lang="en">BingLiu H., Shu L., Yu Ph. S. BERT post-training for review reading comprehension and aspect-based sentiment analysis, arXiv preprint arXiv:1904.02232 (2019).</mixed-citation><mixed-citation xml:lang="ru">BingLiu H., Shu L., Yu Ph.S. BERT post-training for review reading comprehension and aspect-based sentiment analysis // arXiv preprint arXiv:1904.02232 (2019).</mixed-citation></citation-alternatives></ref><ref id="B6"><label>6.</label><citation-alternatives><mixed-citation xml:lang="en">Chiarcos C., Apostol E.-S., Kabashi B., Truică C.-O. Modelling frequency, attestation, and corpus-398 based information with OntoLex-FrAC, Proceedings of the 29 th International Conference on 400 Computational Linguistics, pp. 4018—4027.</mixed-citation><mixed-citation xml:lang="ru">Chiarcos C., Apostol E.-S., Kabashi B., Truică C.-O. Modelling frequency, attestation, and corpus-398 based information with OntoLex-FrAC // In Proceedings of the 29th International Conference on 400 Computational Linguistics. 2022. P. 4018—4027.</mixed-citation></citation-alternatives></ref><ref id="B7"><label>7.</label><citation-alternatives><mixed-citation xml:lang="en">Roumeliotis K. I., Tselikas N. D. ChatGPT and Open-AI Models: A Preliminary Review, Future Internet, 2023, vol. 15, pp. 192, available at: https: //doi.org/10.3390 /fi15060192.</mixed-citation><mixed-citation xml:lang="ru">Roumeliotis K. I., Tselikas N. D. ChatGPT and Open-AI Models: A Preliminary Review // Future Internet. 2023. Vol. 15. P. 192. https: //doi.org/10.3390 /fi15060192.</mixed-citation></citation-alternatives></ref><ref id="B8"><label>8.</label><citation-alternatives><mixed-citation xml:lang="en">An international repository for data analysis and original technological solutions. [Electronic resource]. 2024. Date of update: 10.04.2024. URL: https://www.kaggle.com/ (date of access: 16.04.2022).</mixed-citation><mixed-citation xml:lang="ru">Международный репозиторий для анализа данных и оригинальных технологических решений. [Электронный ресурс]. 2024. Дата обновления: 10.04.2024. URL: https://www. kaggle.com/ (дата обращения: 16.04.2022).</mixed-citation></citation-alternatives></ref><ref id="B9"><label>9.</label><citation-alternatives><mixed-citation xml:lang="en">International repository of language neural network models. [Electronic resource], 2024, update date: 12.03.2024, available at: https://huggingface.co/models (date of access: 26.09.2023).</mixed-citation><mixed-citation xml:lang="ru">Международный репозиторий языковых нейросетевых моделей. [Электронный ресурс]. 2024. Дата обновления: 12.03.2024. URL: https://huggingface.co/models. (дата обращения: 26.09.2023).</mixed-citation></citation-alternatives></ref><ref id="B10"><label>10.</label><citation-alternatives><mixed-citation xml:lang="en">Kashirin I. Yu. Application of hierarchical number theory in the construction of ICF taxonomy for optimization of neural networks, Vestnik RGRTU, 2022, pp. 118—126</mixed-citation><mixed-citation xml:lang="ru">Каширин И. Ю. Применение иерархической теории чисел при построении таксономии ICF для оптимизации нейронных сетей // Вестник РГРТУ. 2022. С. 118—126.</mixed-citation></citation-alternatives></ref><ref id="B11"><label>11.</label><citation-alternatives><mixed-citation xml:lang="en">Bader F., Calvanese D., MacGuinness D., Nardi D., Patel Schneider P. ed. The Description Logics Handbook. Theory, Implementation and Applications, New York, Cambridge University Press, 2003.</mixed-citation><mixed-citation xml:lang="ru">The Description Logics Handbook. Theory, Implementation and Applications. Ed. By F. Bader, D. Calvanese, D. MacGuinness, D. Nardi, P. Patel Schneider. New York: Cambridge University Press, 2003.</mixed-citation></citation-alternatives></ref><ref id="B12"><label>12.</label><citation-alternatives><mixed-citation xml:lang="en">Kashirin I. Yu., Filatov I. Yu. Formalized Description of Intuitive Perception Of Spatial Situations, 2019 8th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 2019, pp. 1—4.</mixed-citation><mixed-citation xml:lang="ru">Kashirin I. Yu., Filatov I. Yu. Formalized Description Of Intuitive Perception Of Spatial Situations // 2019 8th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro. 2019. P. 1—4.</mixed-citation></citation-alternatives></ref><ref id="B13"><label>13.</label><citation-alternatives><mixed-citation xml:lang="en">Duineveld A. J., Stoter R., Weiden M. R., Kenepa B., Benjamins V. R. WonderTools? A comparative study of ontological engineering tools, International Journal of Human-Computer Studies , 2000, vol. 52, no. 6, pp. 1111—1133.</mixed-citation><mixed-citation xml:lang="ru">Duineveld A. J., Stoter R., Weiden M. R., Kenepa B., Benjamins V. R. WonderTools? A comparative study of ontological engineering tools // International Journal of Human-Computer Studies. 2000. Vol. 52, N. 6. P. 1111—1133.</mixed-citation></citation-alternatives></ref><ref id="B14"><label>14.</label><citation-alternatives><mixed-citation xml:lang="en">Kashirin D. I., Kashirin I. YU., Pyl’kin A. N. Polimorficheskoye predstavleniye znaniy v Semantic Web, Moscow, Goryachaya liniya — Telekom, 2009, 138 p.</mixed-citation><mixed-citation xml:lang="ru">Каширин Д. И., Каширин И. Ю., Пылькин А. Н. Полиморфическое представление знаний в Semantic Web. М.: Горячая линия — Телеком, 2009. 138 с.</mixed-citation></citation-alternatives></ref><ref id="B15"><label>15.</label><citation-alternatives><mixed-citation xml:lang="en">Kashirin I. Yu. Iyerarkhicheskiye chisla dlya proyektirovaniya taksonomiy iskusstvennogo intellekta ICF, Vestnik RGRTU, 2020, no. 71, pp. 71—82.</mixed-citation><mixed-citation xml:lang="ru">Каширин И. Ю. Иерархические числа для проектирования таксономий искусственного интеллекта ICF // Вестник РГРТУ. 2020. № 71. С.71—82.</mixed-citation></citation-alternatives></ref><ref id="B16"><label>16.</label><citation-alternatives><mixed-citation xml:lang="en">Definition of hierarchical numbers. [Electronic resource], 2024, update date: 03/04/2024, available at: https://kashirin.net/ definition-of-hierarchical-numbers (access date: 04/16/2022).</mixed-citation><mixed-citation xml:lang="ru">Definition of hierarchical numbers. [Electronic resource]. 2024. Update date: 03/04/2024. URL: https://kashirin.net/defini- tion-of-hierarchical-numbers (access date: 04/16/2022).</mixed-citation></citation-alternatives></ref></ref-list></back></article>
