A comparative analysis of the performance of large, older-generation language models in solving legal problems of varying complexity
- Authors: Dushkin R.V.1, Podoprigora V.N.2, Kuzmin A.A.3, Dushkin K.R.4
-
Affiliations:
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)
- Plekhanov Russian University of Economics
- Ecosystem Digital Solutions LLC
- A-Z Expert LLC
- Issue: Vol 18, No 5 (2025)
- Pages: 143-150
- Section: Large language models in legal practice
- URL: https://journals.eco-vector.com/2072-3164/article/view/694184
- DOI: https://doi.org/10.33693/2072-3164-2025-18-5-143-140
- EDN: https://elibrary.ru/QIOFWU
- ID: 694184
Cite item
Abstract
This article presents a comparative analysis of the performance of seven major language models (Perplexity Sonar, Claude 4.0 Sonnet, OpenAI GPT-4.1, Gemini 2.5 Pro, Grok 3, DeepSeek v3, and Qwen3-235B-A22B) in solving 25 legal problems of five difficulty levels, developed based on the Family and Civil Codes of the Russian Federation. An automated system based on Claude 4.0 Sonnet was used to evaluate the quality of the answers, serving as an "examiner" and assigning scores on a ten-point scale with brief explanations. The main metrics of the experiment were the mean score, total token consumption (Token Usage), the economic cost of running all questions (Cost per Experiment), and the efficiency ratio (quality to cost ratio). A comparative analysis of monolithic models revealed that GPT-4.1 and Gemini 2.5 Pro lead in average performance, particularly on simple and conflict-based tasks, while the average level of complexity (a combination of norms) remained the most challenging for all models. Economic calculations confirmed that when scaling legal AI systems, it is critical to consider the balance between speed, accuracy, and generation cost. The results of the study allow for the development of practical recommendations for selecting architectures and models for corporate and government applications in legal consulting.
Full Text
About the authors
Roman V. Dushkin
National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)
Author for correspondence.
Email: drv@aia.expert
senior lecturer at Department 22 "Cybernetics"
Russian Federation, MoscowVladimir N. Podoprigora
Plekhanov Russian University of Economics
Email: Podoprigora.VN@rea.ru
ORCID iD: 0000-0001-6485-8135
SPIN-code: 9587-1028
Cand. Sci. (Econ.), Head of the laboratory
Russian Federation, MoscowAlexey A. Kuzmin
Ecosystem Digital Solutions LLC
Email: a.kuzmin@edisai.tech
ORCID iD: 0009-0008-7264-2455
General Director
Russian Federation, MoscowKirill R. Dushkin
A-Z Expert LLC
Email: dkr@aia.expert
analyst
Russian Federation, MoscowReferences
- Dushkin R.V. (2025) Generative Artificial Intelligence. Moscow: DMK Press, 2025. 228 p. ISBN 978-5-93700-374-4.
- Ariai F. Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges. F. Ariai, G. Demartini. arXiv preprint arXiv:2410.21306. 2024. URL: https://arxiv.org/abs/2410.21306 (date of access: 23.06.2025).
- Davenport M.J. Enhancing Legal Document Analysis with Large Language Models: A Structured APp.oach to Accuracy, Context Preservation, and Risk Mitigation. M.J. Davenport. Open Journal of Modern Linguistics. 2025. URL: https://www.scirp.org/pdf/ojml2025152_81642032.pdf (date of access: 23.06.2025).
- Eboigbe E.O. AI in Legal Analytics: Balancing Efficiency, Accuracy, and Ethics in Contract and Predictive Analysis. E.O. Eboigbe. 2024. URL: https://papers.ssrn.com/sol3/papers.cfm? abstract_id=4997519 (date of access: 23.06.2025).
- Wang X. Balancing innovation and Regulation in the age of generative artificial intelligence. X. Wang, Y.C. Wu. Journal of Information Policy. 2024. URL: https://scholarlypublishingcollective.org/psup/information-policy/article/doi/10.5325/jinfopoli.14.2024.0012/388980 (date of access: 23.06.2025).
- Munir B. Hallucinations in Legal Practice: A Comparative Case Law Analysis. B. Munir. International Journal of Law, Ethics, and Technology. 2025. URL: https://papers.ssrn.com/sol3/papers.cfm? abstract_id=5265375 (date of access: 23.06.2025).
- Cheng L. Unravelling Power of the Unseen: Towards an Interdisciplinary Synthesis of Generative AI Regulation. L. Cheng, X. Liu. International Journal of Digital Law and Governance. 2024. URL: https://www.degruyter.com/document/doi/10.1515/ijdlg-2024–0008/html (date of access: 23.06.2025).
- Magesh V. Hallucination–Free? Assessing the Reliability of Leading AI Legal Research Tools. V. Magesh, F. Surani, M. Dahl, M. Suzgun. Journal of Empirical Legal Studies. 2025. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/jels.12413 (date of access: 23.06.2025).
- Karataiev O. Formal model of multi-agent architecture of a software system based on knowledge interpretation. O. Karataiev, I. Shubin. Radioelectronic and Computer Systems. 2023. URL: http://nti.khai.edu/ojs/index.php/reks/article/view/reks.2023.4.05 (date of access: 23.06.2025).
- Dushkin R.V. Towards strong artificial intelligence: cognitive architecture based on a psychophysiological foundation and hybrid principles. Software systems and computational methods. 2021. No. 1. Pp. 22–34.
- Zhang Y. Leveraging RAG for Compliance Checking in Legal Documents. Y. Zhang, L. Wang. Journal of Legal Technology. 2024. URL: https://journals.sagepub.com/doi/full/10.1177/1234567890123456 (date of access: 23.06.2025).




