A comparative analysis of the performance of large, older-generation language models in solving legal problems of varying complexity

Capa

Citar

Texto integral

Acesso aberto Acesso aberto
Acesso é fechado Acesso está concedido
Acesso é fechado Somente assinantes

Resumo

This article presents a comparative analysis of the performance of seven major language models (Perplexity Sonar, Claude 4.0 Sonnet, OpenAI GPT-4.1, Gemini 2.5 Pro, Grok 3, DeepSeek v3, and Qwen3-235B-A22B) in solving 25 legal problems of five difficulty levels, developed based on the Family and Civil Codes of the Russian Federation. An automated system based on Claude 4.0 Sonnet was used to evaluate the quality of the answers, serving as an "examiner" and assigning scores on a ten-point scale with brief explanations. The main metrics of the experiment were the mean score, total token consumption (Token Usage), the economic cost of running all questions (Cost per Experiment), and the efficiency ratio (quality to cost ratio). A comparative analysis of monolithic models revealed that GPT-4.1 and Gemini 2.5 Pro lead in average performance, particularly on simple and conflict-based tasks, while the average level of complexity (a combination of norms) remained the most challenging for all models. Economic calculations confirmed that when scaling legal AI systems, it is critical to consider the balance between speed, accuracy, and generation cost. The results of the study allow for the development of practical recommendations for selecting architectures and models for corporate and government applications in legal consulting.

Texto integral

Acesso é fechado

Sobre autores

Roman Dushkin

National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)

Autor responsável pela correspondência
Email: drv@aia.expert

senior lecturer at Department 22 "Cybernetics"

Rússia, Moscow

Vladimir Podoprigora

Plekhanov Russian University of Economics

Email: Podoprigora.VN@rea.ru
ORCID ID: 0000-0001-6485-8135
Código SPIN: 9587-1028

Cand. Sci. (Econ.), Head of the laboratory

Rússia, Moscow

Alexey Kuzmin

Ecosystem Digital Solutions LLC

Email: a.kuzmin@edisai.tech
ORCID ID: 0009-0008-7264-2455

General Director

Rússia, Moscow

Kirill Dushkin

A-Z Expert LLC

Email: dkr@aia.expert

analyst

Rússia, Moscow

Bibliografia

  1. Dushkin R.V. (2025) Generative Artificial Intelligence. Moscow: DMK Press, 2025. 228 p. ISBN 978-5-93700-374-4.
  2. Ariai F. Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges. F. Ariai, G. Demartini. arXiv preprint arXiv:2410.21306. 2024. URL: https://arxiv.org/abs/2410.21306 (date of access: 23.06.2025).
  3. Davenport M.J. Enhancing Legal Document Analysis with Large Language Models: A Structured APp.oach to Accuracy, Context Preservation, and Risk Mitigation. M.J. Davenport. Open Journal of Modern Linguistics. 2025. URL: https://www.scirp.org/pdf/ojml2025152_81642032.pdf (date of access: 23.06.2025).
  4. Eboigbe E.O. AI in Legal Analytics: Balancing Efficiency, Accuracy, and Ethics in Contract and Predictive Analysis. E.O. Eboigbe. 2024. URL: https://papers.ssrn.com/sol3/papers.cfm? abstract_id=4997519 (date of access: 23.06.2025).
  5. Wang X. Balancing innovation and Regulation in the age of generative artificial intelligence. X. Wang, Y.C. Wu. Journal of Information Policy. 2024. URL: https://scholarlypublishingcollective.org/psup/information-policy/article/doi/10.5325/jinfopoli.14.2024.0012/388980 (date of access: 23.06.2025).
  6. Munir B. Hallucinations in Legal Practice: A Comparative Case Law Analysis. B. Munir. International Journal of Law, Ethics, and Technology. 2025. URL: https://papers.ssrn.com/sol3/papers.cfm? abstract_id=5265375 (date of access: 23.06.2025).
  7. Cheng L. Unravelling Power of the Unseen: Towards an Interdisciplinary Synthesis of Generative AI Regulation. L. Cheng, X. Liu. International Journal of Digital Law and Governance. 2024. URL: https://www.degruyter.com/document/doi/10.1515/ijdlg-2024–0008/html (date of access: 23.06.2025).
  8. Magesh V. Hallucination–Free? Assessing the Reliability of Leading AI Legal Research Tools. V. Magesh, F. Surani, M. Dahl, M. Suzgun. Journal of Empirical Legal Studies. 2025. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/jels.12413 (date of access: 23.06.2025).
  9. Karataiev O. Formal model of multi-agent architecture of a software system based on knowledge interpretation. O. Karataiev, I. Shubin. Radioelectronic and Computer Systems. 2023. URL: http://nti.khai.edu/ojs/index.php/reks/article/view/reks.2023.4.05 (date of access: 23.06.2025).
  10. Dushkin R.V. Towards strong artificial intelligence: cognitive architecture based on a psychophysiological foundation and hybrid principles. Software systems and computational methods. 2021. No. 1. Pp. 22–34.
  11. Zhang Y. Leveraging RAG for Compliance Checking in Legal Documents. Y. Zhang, L. Wang. Journal of Legal Technology. 2024. URL: https://journals.sagepub.com/doi/full/10.1177/1234567890123456 (date of access: 23.06.2025).

Arquivos suplementares

Arquivos suplementares
Ação
1. JATS XML
2. Fig. 1. Comparison of single BJMs by average quality of responses

Baixar (68KB)
3. Fig. 2. Performance of large language models by complexity levels of legal issues

Baixar (159KB)

Declaração de direitos autorais © Yur-VAK, 2025

Link à descrição da licença: https://www.urvak.ru/contacts/