A comparative analysis of the performance of large, older-generation language models in solving legal problems of varying complexity

Roman V. Dushkin; Душкин Роман Викторович; Vladimir N. Podoprigora; Подопригора Владимир Николаевич; Alexey A. Kuzmin; Кузьмин Алексей Алексеевич; Kirill R. Dushkin; Душкин Кирилл Романович

doi:10.33693/2072-3164-2025-18-5-143-140

A comparative analysis of the performance of large, older-generation language models in solving legal problems of varying complexity

Authors: Dushkin R.V.¹, Podoprigora V.N.², Kuzmin A.A.³, Dushkin K.R.⁴
Affiliations:
1. National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)
2. Plekhanov Russian University of Economics
3. Ecosystem Digital Solutions LLC
4. A-Z Expert LLC
Issue: Vol 18, No 5 (2025)
Pages: 143-150
Section: Large language models in legal practice
URL: https://journals.eco-vector.com/2072-3164/article/view/694184
DOI: https://doi.org/10.33693/2072-3164-2025-18-5-143-140
EDN: https://elibrary.ru/QIOFWU
ID: 694184

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription Access

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

This article presents a comparative analysis of the performance of seven major language models (Perplexity Sonar, Claude 4.0 Sonnet, OpenAI GPT-4.1, Gemini 2.5 Pro, Grok 3, DeepSeek v3, and Qwen3-235B-A22B) in solving 25 legal problems of five difficulty levels, developed based on the Family and Civil Codes of the Russian Federation. An automated system based on Claude 4.0 Sonnet was used to evaluate the quality of the answers, serving as an "examiner" and assigning scores on a ten-point scale with brief explanations. The main metrics of the experiment were the mean score, total token consumption (Token Usage), the economic cost of running all questions (Cost per Experiment), and the efficiency ratio (quality to cost ratio). A comparative analysis of monolithic models revealed that GPT-4.1 and Gemini 2.5 Pro lead in average performance, particularly on simple and conflict-based tasks, while the average level of complexity (a combination of norms) remained the most challenging for all models. Economic calculations confirmed that when scaling legal AI systems, it is critical to consider the balance between speed, accuracy, and generation cost. The results of the study allow for the development of practical recommendations for selecting architectures and models for corporate and government applications in legal consulting.

Keywords

large language models, legal problems, token efficiency, generation cost, retrieval-augmented generation, monolithic system, response quality, family law, civil law, architecture comparison

Full Text

About the authors

Roman V. Dushkin

National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)

Author for correspondence.
Email: drv@aia.expert

senior lecturer at Department 22 "Cybernetics"

Russian Federation, Moscow

Vladimir N. Podoprigora

Plekhanov Russian University of Economics

Email: Podoprigora.VN@rea.ru
ORCID iD: 0000-0001-6485-8135
SPIN-code: 9587-1028

Cand. Sci. (Econ.), Head of the laboratory

Russian Federation, Moscow

Alexey A. Kuzmin

Ecosystem Digital Solutions LLC

Email: a.kuzmin@edisai.tech
ORCID iD: 0009-0008-7264-2455

General Director

Russian Federation, Moscow

Kirill R. Dushkin

A-Z Expert LLC

Email: dkr@aia.expert

analyst

Russian Federation, Moscow

References

Dushkin R.V. (2025) Generative Artificial Intelligence. Moscow: DMK Press, 2025. 228 p. ISBN 978-5-93700-374-4.
Ariai F. Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges. F. Ariai, G. Demartini. arXiv preprint arXiv:2410.21306. 2024. URL: https://arxiv.org/abs/2410.21306 (date of access: 23.06.2025).
Davenport M.J. Enhancing Legal Document Analysis with Large Language Models: A Structured APp.oach to Accuracy, Context Preservation, and Risk Mitigation. M.J. Davenport. Open Journal of Modern Linguistics. 2025. URL: https://www.scirp.org/pdf/ojml2025152_81642032.pdf (date of access: 23.06.2025).
Eboigbe E.O. AI in Legal Analytics: Balancing Efficiency, Accuracy, and Ethics in Contract and Predictive Analysis. E.O. Eboigbe. 2024. URL: https://papers.ssrn.com/sol3/papers.cfm? abstract_id=4997519 (date of access: 23.06.2025).
Wang X. Balancing innovation and Regulation in the age of generative artificial intelligence. X. Wang, Y.C. Wu. Journal of Information Policy. 2024. URL: https://scholarlypublishingcollective.org/psup/information-policy/article/doi/10.5325/jinfopoli.14.2024.0012/388980 (date of access: 23.06.2025).
Munir B. Hallucinations in Legal Practice: A Comparative Case Law Analysis. B. Munir. International Journal of Law, Ethics, and Technology. 2025. URL: https://papers.ssrn.com/sol3/papers.cfm? abstract_id=5265375 (date of access: 23.06.2025).
Cheng L. Unravelling Power of the Unseen: Towards an Interdisciplinary Synthesis of Generative AI Regulation. L. Cheng, X. Liu. International Journal of Digital Law and Governance. 2024. URL: https://www.degruyter.com/document/doi/10.1515/ijdlg-2024–0008/html (date of access: 23.06.2025).
Magesh V. Hallucination–Free? Assessing the Reliability of Leading AI Legal Research Tools. V. Magesh, F. Surani, M. Dahl, M. Suzgun. Journal of Empirical Legal Studies. 2025. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/jels.12413 (date of access: 23.06.2025).
Karataiev O. Formal model of multi-agent architecture of a software system based on knowledge interpretation. O. Karataiev, I. Shubin. Radioelectronic and Computer Systems. 2023. URL: http://nti.khai.edu/ojs/index.php/reks/article/view/reks.2023.4.05 (date of access: 23.06.2025).
Dushkin R.V. Towards strong artificial intelligence: cognitive architecture based on a psychophysiological foundation and hybrid principles. Software systems and computational methods. 2021. No. 1. Pp. 22–34.
Zhang Y. Leveraging RAG for Compliance Checking in Legal Documents. Y. Zhang, L. Wang. Journal of Legal Technology. 2024. URL: https://journals.sagepub.com/doi/full/10.1177/1234567890123456 (date of access: 23.06.2025).

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

2. Fig. 1. Comparison of single BJMs by average quality of responses

Download (68KB)

Indexing metadata

3. Fig. 2. Performance of large language models by complexity levels of legal issues

Download (159KB)

Indexing metadata

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register