<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="review-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Informacionnye Tehnologii</journal-id><journal-title-group><journal-title xml:lang="en">Informacionnye Tehnologii</journal-title><trans-title-group xml:lang="ru"><trans-title>Информационные технологии</trans-title></trans-title-group></journal-title-group><issn publication-format="print">1684-6400</issn><publisher><publisher-name xml:lang="en">New Technologies Publishing House</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">702208</article-id><article-id pub-id-type="doi">10.17587/it.31.496-503</article-id><article-categories><subj-group subj-group-type="toc-heading" xml:lang="en"><subject>Application information systems</subject></subj-group><subj-group subj-group-type="toc-heading" xml:lang="ru"><subject>Прикладные информационные системы</subject></subj-group><subj-group subj-group-type="article-type"><subject>Review Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Automated system for detecting plagiarism in files containing program code</article-title><trans-title-group xml:lang="ru"><trans-title>Автоматизированная система проверки файлов, содержащих программный код, на наличие заимствований</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="en"><surname>Bubnova</surname><given-names>M. A.</given-names></name><name xml:lang="ru"><surname>Бубнова</surname><given-names>М. А.</given-names></name></name-alternatives><address><country country="RU">Russian Federation</country></address><bio xml:lang="en"><p>Leading Programmer</p></bio><bio xml:lang="ru"><p>вед. программист</p></bio><email>mbubnova@hse.ru</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff-alternatives id="aff1"><aff><institution xml:lang="en">National Research University Higher School of Economics</institution></aff><aff><institution xml:lang="ru">Федеральное государственное автономное образовательное учреждение высшего образования "Национальный исследовательский университет "Высшая школа экономики"</institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2025-09-15" publication-format="electronic"><day>15</day><month>09</month><year>2025</year></pub-date><volume>31</volume><issue>9</issue><issue-title xml:lang="en">Informacionnye Tehnologii</issue-title><issue-title xml:lang="ru">Информационные технологии</issue-title><fpage>496</fpage><lpage>503</lpage><history><date date-type="received" iso-8601-date="2026-02-05"><day>05</day><month>02</month><year>2026</year></date><date date-type="accepted" iso-8601-date="2026-02-05"><day>05</day><month>02</month><year>2026</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2025, Informacionnye Tehnologii</copyright-statement><copyright-statement xml:lang="ru">Copyright ©; 2025, Информационные технологии</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="en">Informacionnye Tehnologii</copyright-holder><copyright-holder xml:lang="ru">Информационные технологии</copyright-holder></permissions><self-uri xlink:href="https://journals.eco-vector.com/1684-6400/article/view/702208">https://journals.eco-vector.com/1684-6400/article/view/702208</self-uri><abstract xml:lang="en"><p>The article outlines the results of developing a system designed to identify plagiarism in files containing program code. The automated system integrates multiple methods of program code analysis. It serves as a decision-support tool for university instructors by enabling the detection of assignments with plagiarism levels exceeding a predefined threshold. The system is specifically designed to evaluate student submissions within the university, allowing for the comparison of works among students in the same cohort to identify groups with similar or copied content. The system supports the analysis of assignments stored both locally and in cloud storage.</p></abstract><trans-abstract xml:lang="ru"><p>Обсуждается разработка системы, предназначенной для обнаружения заимствований в файлах, содержащих коды программ, основанной на нескольких видах анализа программного кода. Система выступает инструментом поддержки принятия решения для преподавателя университета, так как дает возможность обнаружить работы, в которых присутствует уровень заимствований, превышающий установленный порог. Разработанная система предназначена для выявления групп студентов, чьи работы имеют заимствования между собой.</p></trans-abstract><kwd-group xml:lang="en"><kwd>code plagiarism</kwd><kwd>text tokenization</kwd><kwd>abstract syntax tree (AST)</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>заимствования в программном коде</kwd><kwd>токенизация текста</kwd><kwd>AST</kwd></kwd-group><funding-group/></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><citation-alternatives><mixed-citation xml:lang="en">Potthast M., Gollub T., Hagen M., Grabegger J., Kiesel J., Michel M., Oberlander A., Tippmann M., BarronCenedo A., Gupta P., Rosso P., Stein B. Overview of the 4th International Competition on Plagiarism Detection, CLEF 2012 Evaluation Labs — Working Notes Papers, Rome, September 2012.</mixed-citation><mixed-citation xml:lang="ru">Potthast M., Gollub T., Hagen M., Grabegger J., Kiesel J., Michel M., Oberlander A., Tippmann M., Barron-Cenedo A., Gupta P., Rosso P., Stein B. Overview of the 4th International Competition on Plagiarism Detection // CLEF 2012 Evaluation Labs — Working Notes Papers. Rome, September, 2012.</mixed-citation></citation-alternatives></ref><ref id="B2"><label>2.</label><citation-alternatives><mixed-citation xml:lang="en">Haupt R. L. Plagiarism in Journal Articles, IEEE Antennas and Propagation, Aug. 2003, vol. 45, no. 4.</mixed-citation><mixed-citation xml:lang="ru">Haupt R. L. Plagiarism in Journal Articles // IEEE Antennas and Propagation. Aug. 2003. Vol. 45, N. 4.</mixed-citation></citation-alternatives></ref><ref id="B3"><label>3.</label><citation-alternatives><mixed-citation xml:lang="en">3. Soledad P. M., Ng Yiu-Kai. A Structural, Content-Similarity Measure for Detecting Spam Documents on the Web, International Journal of Web Information Systems, 2009, pp. 431—464.</mixed-citation><mixed-citation xml:lang="ru">Soledad P. M., Ng Yiu-Kai. A Structural, Content-Similarity Measure for Detecting Spam Documents on the Web // International Journal of Web Information Systems. 2009. P. 431—464.</mixed-citation></citation-alternatives></ref><ref id="B4"><label>4.</label><citation-alternatives><mixed-citation xml:lang="en">Chekhovich Yu. V., Belenkaya O. S. Evaluation of the Correctness of Borrowings in Scientific Publications, Scientific Publication at the International Level — 2018: Editorial Policy, Open Access, Scientific Communications: Proc. of the 7th Int. Sci.-Pract. Conf. (Moscow, April 24—27, 2018), Moscow, Your Digital Publishing House, 2018, pp. 158—162, DOI 10.24069/konf-24-27-04-2018.28.</mixed-citation><mixed-citation xml:lang="ru">Чехович Ю. В., Беленькая О. С. Оценка корректности заимствований в текстах научных публикаций // Научное издание международного уровня — 2018: редакционная политика, открытый доступ, научные коммуникации: мат. 7-й Междунар. науч.-практ. конф. (Москва, 24—27 апр. 2018 г.). М.: Ваше цифровое изд-во, 2018. С. 158—162. DOI 10.24069/konf-24-27-04-2018.28.</mixed-citation></citation-alternatives></ref><ref id="B5"><label>5.</label><citation-alternatives><mixed-citation xml:lang="en">Nikolaev V. V., Rakhkonen M. E. Application of Various Tools and the Use of the Chatbot "ChatGPT" in Writing Scientific Papers Checked by the "Anti-Plagiarism" Program, Prof. Legal Educ. Sci., 2023, no. 1 (9), pp. 78—81.</mixed-citation><mixed-citation xml:lang="ru">Николаев В. В., Рахконен М. Е. Применение различных инструментов и использование чат-бота "ChatGpt" при написании научных работ, проверяемых в программе "Антиплагиат" // Проф. юридич. обр. и наука. 2023. № 1 (9). С. 78—81.</mixed-citation></citation-alternatives></ref><ref id="B6"><label>6.</label><citation-alternatives><mixed-citation xml:lang="en">Roy C. K., Cordy J. R., Koschke R. Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach, Science of Computer Programming, 2009, vol. 74, no. 7, pp. 470—495.</mixed-citation><mixed-citation xml:lang="ru">Roy C. K., Cordy J. R., Koschke R. Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach // Science of Computer Programming. 2009. Vol. 74, N.</mixed-citation></citation-alternatives></ref><ref id="B7"><label>7.</label><citation-alternatives><mixed-citation xml:lang="en">Johnson J. Visualizing Textual Redundancy in Legacy Source, Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 2004, 1994, pp. 171—183.</mixed-citation><mixed-citation xml:lang="ru">P. 470—495. 7. Johnson J. Visualizing Textual Redundancy in Legacy Source // Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 2004. 1994. P. 171—183.</mixed-citation></citation-alternatives></ref><ref id="B8"><label>8.</label><citation-alternatives><mixed-citation xml:lang="en">Juergens E., Deissenboeck F., Hummel B., Wagner S. Do Code Clones Matter?, Proceedings of the 31st International Conference on Software Engineering, ICSE 2009, 2009, p. 1.</mixed-citation><mixed-citation xml:lang="ru">Juergens E., Deissenboeck F., Hummel B., Wagner S. Do Code Clones Matter? // Proceedings of the 31st International Conference on Software Engineering, ICSE 2009. 2009. P. 1.</mixed-citation></citation-alternatives></ref><ref id="B9"><label>9.</label><citation-alternatives><mixed-citation xml:lang="en">Tairas R., Gray J. Phoenix-Based Clone Detection Using Suffix Trees, Proceedings of the 44th Annual Southeast Regional Conference, ACM-SE 2006,2006, pp. 679—684.</mixed-citation><mixed-citation xml:lang="ru">Tairas R., Gray J. Phoenix-Based Clone Detection Using Suffix Trees // Proceedings of the 44th Annual Southeast Regional Conference, ACM-SE 2006. 2006. P. 679—684.</mixed-citation></citation-alternatives></ref><ref id="B10"><label>10.</label><citation-alternatives><mixed-citation xml:lang="en">Wahler V., Seipel D., Gudenberg J., Fischer G. Clone Detection in Source Code by Frequent Itemset Techniques, Proceedings of the 4th IEEE International Workshop Source Code Analysis and Manipulation, SCAM 2004, 2004, pp. 128—135.</mixed-citation><mixed-citation xml:lang="ru">Wahler V., Seipel D., Gudenberg J., Fischer G. Clone Detection in Source Code by Frequent Itemset Techniques // Proceedings of the 4th IEEE International Workshop Source Code Analysis and Manipulation, SCAM 2004. 2004. P. 128—135.</mixed-citation></citation-alternatives></ref><ref id="B11"><label>11.</label><citation-alternatives><mixed-citation xml:lang="en">Liu C., Chen C., Han J., Yu P. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, 2006, pp. 872—881.</mixed-citation><mixed-citation xml:lang="ru">Liu C., Chen C., Han J., Yu P. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis // Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006. 2006. P. 872—881.</mixed-citation></citation-alternatives></ref><ref id="B12"><label>12.</label><citation-alternatives><mixed-citation xml:lang="en">Gabel M., Jiang L., Su Z. Scalable Detection of Semantic Clones, Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, 2008, pp. 321—330.</mixed-citation><mixed-citation xml:lang="ru">Gabel M., Jiang L., Su Z. Scalable Detection of Semantic Clones // Proceedings of the 30th International Conference on Software Engineering, ICSE 2008. 2008. P. 321—330.</mixed-citation></citation-alternatives></ref><ref id="B13"><label>13.</label><citation-alternatives><mixed-citation xml:lang="en">Komondoor R., Horwitz S. Using Slicing to Identify Duplication in Source Code, Proceedings of the 8th International Symposium on Static Analysis, SAS 2001, 2001, pp. 40—56.</mixed-citation><mixed-citation xml:lang="ru">Komondoor R., Horwitz S. Using Slicing to Identify Duplication in Source Code // Proceedings of the 8th International Symposium on Static Analysis, SAS 2001. 2001. P. 40—56.</mixed-citation></citation-alternatives></ref><ref id="B14"><label>14.</label><citation-alternatives><mixed-citation xml:lang="en">Kontogiannis K., DeMori R., Merlo E., Galler M., Bernstein M. Pattern Matching for Clone and Concept Detection, Journal of Automated Software Engineering,1996, vol. 3, no. 1—2, pp. 77—108.</mixed-citation><mixed-citation xml:lang="ru">Kontogiannis K., DeMori R., Merlo E., Galler M., Bernstein M. Pattern Matching for Clone and Concept Detection // Journal of Automated Software Engineering. 1996. Vol. 3, N. 1—2. P. 77—108.</mixed-citation></citation-alternatives></ref><ref id="B15"><label>15.</label><citation-alternatives><mixed-citation xml:lang="en">Davey N., Barson P., Field S., Frank R. The Development of a Software Clone Detector, International Journal of Applied Software Technology, 1995, vol. 1, no. 3/4, pp. 219—236.</mixed-citation><mixed-citation xml:lang="ru">Davey N., Barson P., Field S., Frank R. The Development of a Software Clone Detector // International Journal of Applied Software Technology. 1995. Vol. 1, N. 3/4. P. 219—236.</mixed-citation></citation-alternatives></ref><ref id="B16"><label>16.</label><citation-alternatives><mixed-citation xml:lang="en">Mayrand J., Leblanc C., Merlo E. Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics, Proceedings of the 12th International Conference on Software Maintenance, ICSM 1996, 1996, pp. 244—253.</mixed-citation><mixed-citation xml:lang="ru">Mayrand J., Leblanc C., Merlo E. Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics // Proceedings of the 12th International Conference on Software Maintenance, ICSM 1996. 1996. P. 244—253.</mixed-citation></citation-alternatives></ref><ref id="B17"><label>17.</label><citation-alternatives><mixed-citation xml:lang="en">Evtifeeva O. A., Krass A. L., Lakunin M. A., Lysenko E. A., Schastlivtsev R. R. Analysis of Algorithms for Detecting Plagiarism in Program Source Codes, Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2007, no. 5, pp. 188—197, available at: https://cyberleninka.ru/article/n/analiz-algoritmov-poiska-plagiata-v-ishodnyh-kodah-programm (accessed: 02.02.2025).</mixed-citation><mixed-citation xml:lang="ru">Евстифеева О. А., Красс А. Л., Лакунин М. А., Лысенко Е. А., Счастливцев Р. Р. Анализ алгоритмов поиска плагиата в исходных кодах программ // Научно-технический вестник информационных технологий, механики и оптики. 2007. № 5. С. 188—197. URL: https://cyberleninka. ru/article/n/analiz-algoritmov-poiska-plagiata-v-ishodnyh-kodahprogramm (дата обращения: 02.02.2025).</mixed-citation></citation-alternatives></ref><ref id="B18"><label>18.</label><citation-alternatives><mixed-citation xml:lang="en">Liu C., Chen C., Han J., Yu P. S. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis, Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20—23, 2006, pp. 872—881.</mixed-citation><mixed-citation xml:lang="ru">Liu C., Chen C., Han J., Yu P. S. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis // Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006. 2006. P. 872—881.</mixed-citation></citation-alternatives></ref><ref id="B19"><label>19.</label><citation-alternatives><mixed-citation xml:lang="en">Engels S., Lakshmanan V., Craig M. Plagiarism Detection Using Feature-Based Neural, ACM SIGCSE Bulletin, 2007, vol. 39, no. 1, pp. 34—38.</mixed-citation><mixed-citation xml:lang="ru">Engels S., Lakshmanan V., Craig M. Plagiarism Detection Using Feature-Based Neural Networks // ACM SIGCSE Bulletin. 2007. Vol. 39, N. 1. P. 34—38.</mixed-citation></citation-alternatives></ref><ref id="B20"><label>20.</label><citation-alternatives><mixed-citation xml:lang="en">Hage J., Vermeer B., Verburg G. Research Paper: Plagiarism Detection for Haskell with Holmes, Proceedings of the 3rd Computer Science Education Research Conference, CSERC 2013, Arnhem, The Netherlands, April 04—05, 2013, pp. 19—30.</mixed-citation><mixed-citation xml:lang="ru">Hage J., Vermeer B., Verburg G. Research Paper: Plagiarism Detection for Haskell with Holmes // Proceedings of the 3rd Computer Science Education Research Conference, CSERC 2013, Arnhem, The Netherlands, April 04-05, 2013. 2013. P. 19—30.</mixed-citation></citation-alternatives></ref><ref id="B21"><label>21.</label><citation-alternatives><mixed-citation xml:lang="en">Weber R., Schek H. J., Blott S. A Quantitative Analysis and Performance Study for Similarity-Search Methods in HighDimensional Spaces, Proceedings of the 24th VLDB Conference, New York, 1998, pp. 194—205.</mixed-citation><mixed-citation xml:lang="ru">Weber R., Schek H. J., Blott S. A Quantitative Analysis and Performance Study for Similarity-Search Methods in HighDimensional Spaces // Proceedings of the 24th VLDB Conference, New York. 1998. P. 194—205.</mixed-citation></citation-alternatives></ref><ref id="B22"><label>22.</label><citation-alternatives><mixed-citation xml:lang="en">Bubnova M. A., Melekh N. A. Automated System for Detecting Plagiarism in Program Codes, Interuniversity Scientific and Technical Conference of Students, Postgraduates, and Young Specialists Named after E. V. Armensky: Proc. of the Conf. (Moscow, February 25 — March 4, 2020), Moscow, Moscow Institute of Electronics and Mathematics, National Research University Higher School of Economics, 2020, pp. 61—62.</mixed-citation><mixed-citation xml:lang="ru">Бубнова М. А., Мелех Н. А. Автоматизированная система проверки программных кодов на наличие плагиата // Межвузовская научно-техническая конференция студентов, аспирантов и молодых специалистов им. Е. В. Арменского: мат. конф. (Москва, 25 февр. — 04 марта 2020 г.). М.: Московский институт электроники и математики НИУ ВШЭ, 2020. С. 61—62.</mixed-citation></citation-alternatives></ref></ref-list></back></article>
