Prospects for the generative artificial intelligence application in surgery, traumatology and orthopedics

Cover Page


Cite item

Abstract

The review considers the use of generative artificial intelligence technologies in surgery, traumatology and orthopedics. Definitions of key generative artificial intelligence technologies are given, as well as the difference between discriminative and generative models of artificial intelligence. An analysis of publication activity on the use of generative artificial intelligence in surgery, traumatology and orthopedics in world macroregions is conducted. The potential role of various generative artificial intelligence models at the preoperative, intraoperative and postoperative stages of healthcare is analyzed. Data on the results of clinical application of generative artificial intelligence and the most common problems associated with the practical use of various generative artificial intelligence applications are provided including issues of quality and safety of surgical care. The review proposes potential solutions and research directions to address these problems.

Full Text

INTRODUCTION

Over the past 10 to 15 years, artificial intelligence (AI) technologies have advanced at an unprecedented pace, driven by increases in computational power, processing capacity, and the availability of data. The latest surge in AI interest is largely attributed to the emergence of generative AI, which has enabled a new class of applications—generating long, coherent texts; providing in-depth responses to questions; summarizing and comparing texts; creating images; and analyzing videos, etc.

This technological breakthrough has significantly broadened access to AI for the general public and enhanced its practical relevance across various industries, including banking, transportation, insurance, telecommunications, production and manufacturing, education, and healthcare.

The growing attention to AI technologies is reflected in the sharp rise in investments: over the past decade, global private AI investments have increased 30-fold, reaching approximately US $90 billion in 2022, and are projected to rise to US $160 billion by 2025 [1]. According to Gartner, by 2026, more than 80% of enterprises are expected to use generative AI models or AI-enabled applications in production environments, compared with less than 5% in 2023 [2].

In Russia, the consulting firm Yakov & Partners has estimated the total economic potential of AI at 22 to 36 trillion rubles in nominal terms. By 2028, the resulting impact on revenue growth and cost reduction could reach 4.2 to 6.9 trillion rubles—equivalent to a 4% increase in gross domestic product [3]. The contribution of generative AI is projected at 0.8 to 1.3 trillion rubles, or approximately 20% of that value.

Generative AI (GenAI) encompasses a class of AI methods in which algorithms are trained on large datasets to generate new content, e.g., text, images, or video. GenAI is based on large-scale models (in terms of the number of parameters or layers in a neural network) that are pre-trained on vast datasets. These models are commonly referred to as Foundation Models. The most well-known foundation model to date is GPT-4 by the US-based company OpenAI. The first version, GPT-1, had approximately 120 million parameters, whereas GPT-4 is estimated to have up to 1.76 trillion parameters, although the exact figure has not been publicly disclosed by OpenAI [4]. Beyond the United States, more than a dozen countries are developing original foundation models, including Russia, where major domestic efforts are led by Sber (GigaChat, Kandinsky 2.2) and Yandex (YandexGPT, YandexART) [3]. For Russian-speaking users, a key advantage of domestic GenAI models is their improved performance in generating texts in Russian.

The healthcare sector is lagging behind other industries in adopting AI technologies, despite increasing interest among health care professionals. It prompted the World Health Organization (WHO) to issue several documents regulating various aspects of AI use in healthcare.

The WHO guideline Recommendations on Digital Interventions for Health System Strengthening [5] emphasizes that such implementations often proceed without sufficient evidence regarding their benefits and risks. This contributes to the spread of unreliable technologies and a multitude of digital tools whose impact on health care delivery and population well-being remains poorly understood. While recognizing the innovative potential of digital tools, it is essential to rigorously evaluate their effectiveness to ensure that investments do not divert resources from non-digital alternatives.

In 2021, WHO published guidance on the ethics and governance of artificial intelligence for healthcare [6]. In this report, 20 leading AI experts outlined the potential benefits and risks of using AI in healthcare, formulating principles to guide the development and implementation of AI in medical settings: protecting autonomy; promoting human well-being and safety; ensuring transparency and “explainability” of decision-making; fostering responsibility and accountability; and enabling adaptability and resilience of AI technologies.

In 2024, WHO published guidance on the use of large multimodal models for generative AI technologies for medicine, capable of processing multiple types of input data and generating diverse output types distinct from the input data [7]. The guidance outlines risks associated with the development and implementation of generative AI technologies in medicine and healthcare, along with strategies for their mitigation.

According to the National Strategy for the Development of Artificial Intelligence for the Period Until 2030,1 AI is among the most important technologies available to humanity. It drives global economic growth, accelerates innovation across scientific domains, improves quality of life, enhances access to and quality of health care and education, increases labor productivity, and enriches leisure activities. The strategy outlines key principles for the development and deployment of AI technologies, including:

  • protection of human rights and freedoms;
  • safety and the impermissibility of using AI for the intentional harm of individuals and organizations;
  • transparency and explainability of AI processes and outcomes;
  • technological sovereignty;
  • integrity of the innovation cycle;
  • efficient use of AI technologies;
  • support for competition;
  • openness and accessibility;
  • continuity;
  • security and legal protection of AI technologies;
  • reliability of input data.

The aim of this review is to examine the potential applications of generative and multimodal AI models in surgery and to explore their prospects in the fields of trauma care and orthopedics.

RESEARCH METHODOLOGY

The analysis of GenAI applications in surgery, traumatology, and orthopedics was based on publications retrieved from the PubMed database and the Russian-language eLibrary database from 2019 to October 2024.

A total of 1766 publications were identified in eLibrary during the specified period using the keywords “генеративный искусственный интеллект” (generative artificial intelligence) or “большие языковые модели” (large language models). Among these, 23 were review articles that included keywords such as “здравоохранение” (healthcare) or “медицина” (medicine), and only one publication was found with keywords such as “хирургия” (surgery), “ортопедия”, (orthopedics) or “травматология” (traumatology).

A PubMed query using the search string “((‘Generative’ AND ‘artificial’ AND ‘intelligence’) OR (‘large’ AND ‘language’ AND ‘models’)) AND (‘surgery’ OR ‘orthopedics’ OR ‘traumatology’ OR ‘spine’)” yielded 872 publications. These included 37 systematic reviews and meta-analyses, 115 literature reviews, and 4 clinical studies focusing on medical documentation.

DISCUSSION

Defining Core Technologies of Generative Artificial Intelligence

Before exploring the potential applications and future directions of generative AI, it is essential to define its key underlying technologies.

According to the Russian national standard (GOST) on AI systems [8], artificial intelligence refers to a suite of technological solutions designed to simulate human cognitive functions—including self-learning, decision-making without predefined algorithms, and insight generation—and to achieve results comparable to human intellectual activity when solving specific data processing tasks. This suite includes information and communication infrastructure, software (including machine learning–based), and data processing and decision-making services. This definition is also cited in Russia’s National Strategy for the Development of Artificial Intelligence for the Period Until 2030.

The same standard [8] outlines 12 AI system categories; however, in practice, AI technologies are typically divided into two main types: discriminative AI (DAI) and generative AI.

DAI focuses on analyzing differences between datasets and classifying them into predefined categories [9]. Common DAI applications include image and speech recognition, natural language processing, and predictive analytics, often using techniques such as logistic regression, support vector machines (SVMs), and neural networks. In contrast, genAI systems create new data or information for the user by identifying and learning patterns in existing datasets. Examples of genAI applications include chatbots, molecular structure generation, and high-resolution image synthesis. These technologies rely on specialized machine learning systems known as large language models (LLMs) or large multimodal models (Fig. 1).

 

Fig. 1. Most common models and algorithms of generative artificial intelligence in medicine (adapted from [17]).

Note. AI, artificial intelligence; ML, machine learning; DAI, discriminative artificial intelligence; GenAI, generative artificial intelligence; LLM, large language models; CNN, convolutional neural network; RNN, recurrent neural network; LSTM, long short-term memory network; GAN, generative adversarial network; AAE, adversarial autoencoder; SAE, sparse autoencoder; VAE, variational autoencoder.

 

Interest has recently grown in hybrid models that combine the strengths of generative and discriminative approaches, including reinforcement learning and semi-supervised learning techniques [10].

An artificial neural network is a mathematical model inspired by the structure and function of biological neural networks, implemented as software or hardware [11]. It consists of interconnected simple processors (artificial neurons) that periodically receive and transmit signals to other processors. In the context of machine learning, neural networks are a subset of pattern recognition and discriminant analysis methods. A key advantage of neural networks over standard algorithms is their ability to learn, achieved by determining connection weights between neurons. During training, neural networks can identify complex relationships between input and output data and perform generalization.

Deep learning refers to a set of machine learning methods focused on learning representations rather than task-specific algorithms. These methods include unsupervised learning systems, recurrent neural networks, and recursive neural networks. Deep learning has been applied in fields such as computer vision, speech recognition, audio processing, natural language processing, and bioinformatics.2

Natural language processing (NLP) is a machine learning discipline focused on recognition, generation, and processing of human speech, both spoken and written, bridging artificial intelligence and linguistics.3

A language model (LM) is an algorithm that analyzes text, understands its context, and generates new text. It relies on nonlinear and probabilistic functions to predict the next word in a sequence by calculating probabilities for each possible word. The primary goal of an LM is to “understand” text based on data patterns and produce coherent responses. LMs are used for tasks such as spam detection, sentiment analysis (e.g., customer reviews), news categorization, and extracting entities like names, addresses, or product names from text.4 As of mid-2024, medical sources referenced models including BERT, Bloomz, Claude 2 (Anthropic PBC), DALL-E, GeneGPT, Google Bard (Gemini), GPT, Flan-T5, Large Language Model Meta AI (LLaMA), Microsoft Bing AI, Pathways Language Model (PaLM), Perplexity, Stable Diffusion, and Vicuna-13B [12, 13].

A large language model (LLM) is a neural network-based language model with numerous parameters, trained on vast amounts of unlabeled text using unsupervised learning. While not formally defined, the term typically applies to deep learning models with a billion or more parameters [14].

A large generative model (LGM) is an AI model capable of interpreting (e.g., providing information based on queries about objects in images or analyzed text) and generating multimodal data (text, images, videos, etc.) at a level comparable to or surpassing human intellectual output.5

Generative pre-trained transformers (GPTs) are neural language models trained on large text datasets to generate contextually relevant natural language text. Pre-training involves learning to predict the next word in a sequence without requiring extensive labeled data.6 The term “transformer” refers to GPT’s use of a self-attention mechanism to process long data sequences without information loss. Text is divided into segments, processed in parallel, and merged into a cohesive output, enabling transformers to handle large datasets efficiently and support subsequent fine-tuning for various NLP tasks. OpenAI introduced the first GPT version in 2018 [14]. Previously, neural LMs relied heavily on supervised learning with manually labeled datasets, limiting their scalability and increasing training costs.

In 2024, ChatGPT, a GPT-based chatbot, surpassed 200 million active users, with approximately 1 million weekly visitors to its website.7 It supports 26 languages, and a 2024 review indicated that ChatGPT accounted for 74% of publications on GenAI applications in healthcare [13].

A chatbot is a program designed for automated text or voice-based interaction with users. Generative chatbots create responses by analyzing each word in a query, producing answers that are not limited to predefined options.8

Publication Activity on Generative AI in Surgery, Traumatology, and Orthopedics

Of 872 identified articles, authors from the United States and China contributed to the highest number of publications (324 and 105, respectively). Europe led among global macroregions with 386 publications, with Germany contributing 74 articles. Authors from developing countries primarily participated in international collaborations (Fig. 2). Russian researchers authored six publications, four of which were international projects.

 

Fig. 2. Publication activity of authors from global macroregions on the topic of generative artificial intelligence in surgery, traumatology, and orthopedics.

 

Applications of Generative AI in Surgery

Publication analysis reveals that GenAI offers a wide range of applications across various stages of surgical care (Fig. 3).

 

Fig. 3. Applications of generative artificial intelligence technologies in surgery (adapted from [17]).

 

Preoperative Stage

One promising application of GenAI in clinical medicine is diagnostic support [13, 15]. Multimodal GenAI models can analyze diverse clinical data—symptoms, laboratory and imaging results, medical records, and audio or video files—to rapidly generate potential diagnoses, enhancing diagnostic speed and accuracy [16–18]. These models can be integrated with staging systems (e.g., TNM for cancer) [19, 20], anatomical classifiers (e.g., for bone fractures), or severity assessment scales (e.g., for evaluating patient condition) [21].

LLMs can provide specialized information to physicians lacking expertise in specific pathologies or serve as reminders for diagnostic and therapeutic measures during preoperative planning [22]. For instance, GenAI models can identify “red flags” in a patient’s condition requiring immediate intervention [23].

Additionally, LLMs are valuable for explaining test results to patients, tailored to varying levels of health literacy [24, 25], and for creating preoperative [26] and postoperative [19] patient instructions or obtaining informed consent [27, 28]. For example, when ChatGPT was used to inform hypothetical patients with skin cancer, its informational materials averaged a score of 7 out of 10 for accuracy and readability [28].

Intraoperative Stage

During surgery, high-resolution visualization and modeling are increasingly critical. GenAI models can process real-time images of the surgical field from various sources, particularly for intraoperative navigation in minimally invasive procedures. Current research focuses on four main areas: 3D modeling of organs and implantable medical devices, endoscopic navigation, tissue differentiation, and augmented reality [29].

Intraoperative 3D reconstruction, based on magnetic resonance imaging (MRI), computed tomography (CT), or ultrasound, can be labor-intensive or yield low-resolution images with conventional rendering techniques. GenAI models reduce the number of images required, accelerate modeling to real-time, and enhance resolution [29, 30].

In endoscopic procedures, deep learning methods are used for depth estimation and 3D mapping. However, obtaining large volumes of high-quality paired video data is challenging due to hardware limitations and labor-intensive labeling [31, 32]. Neural networks are being explored for visual odometry in endoscopes and capsule robots [33–36].

Identifying tissue types and anatomical structures is a fundamental surgical skill essential for safe and effective surgery. In spinal surgery, for example, poor visibility, complex anatomy, and proximity to critical structures (e.g., spinal cord, peripheral nerves, aorta) complicate manipulations [37]. Tissue differentiation is particularly challenging in minimally invasive surgery due to limited direct visibility and tactile feedback. Increasingly, studies focus on intraoperative sensor technologies (e.g., optical coherence tomography, hyperspectral imaging, impedance and vibroacoustic sensing, six-degree-of-freedom strain sensors, near-infrared fluorescence, and probe-based confocal laser endomicroscopy) for tissue classification and differentiation to improve surgical navigation and robotic autonomy [38]. Deep learning models provide more reliable, faster, and accurate data interpretation compared to standard signal processing methods [39].

Augmented reality—an environment integrating physical and virtual objects in real time—enhances intraoperative visualization by overlaying pre- or intraoperative images onto the surgical field. Augmented reality has been described in maxillofacial [40], plastic [41], vascular [42], and spinal [43, 44] surgeries. A key challenge is projecting virtual images onto deformable organs without markers. AI algorithms have been tested in laparoscopic surgeries to address this issue [45, 46]. GenAI models are being developed to reconstruct AR environments from imperfect or limited images [29].

Another promising application of GenAI is automatic generation of operative reports. Accurate and detailed documentation of surgical procedures is critical for continuity of care and legal purposes. However, manual report creation, including via electronic forms, is time-consuming, and often lack completeness and accuracy. An audit at the Royal Hobart Hospital (Australia) found that 45% of surgical reports had significant gaps, rendering them legally vulnerable, and none fully met documentation standards [47].

The use of ChatGPT for operative report generation has been explored in plastic [28, 48], ophthalmic [49, 50], neurosurgical [51, 52], and laparoscopic appendectomy [53] procedures, though these studies remain experimental. Authors note that ChatGPT significantly reduces documentation time with minimal need for final edits [48, 52] and improves completeness by incorporating protocol template prompts [53].

A notable publication by Pakistani plastic surgeon Fizzah Arif described querying ChatGPT about its role in operative report generation [54]. ChatGPT responded that it could standardize reports by documenting procedures using consistent terminology and format: “With ChatGPT, surgeons can ensure all relevant information is documented accurately, avoiding ambiguity or misinterpretation. By assisting surgeons in real time during procedures, ChatGPT enables rapid documentation of key information, which is particularly useful in high-pressure situations where surgeons must focus on the procedure while accurately describing its course.” The author highlighted ChatGPT’s strengths in standardization, accuracy, and real-time support but emphasized that the technology is not fully mature, requiring surgeon review and supplementation [54]. Other experts note challenges in using ChatGPT or other LLMs for rare, specialized, or novel procedures due to limited training data, as well as risks of over-reliance on automation without adequate verification [53].

Another critical application of GenAI is real-time detection of adverse events (AEs), sentinel events, and deviations from the surgical process. AEs are typically recorded retrospectively, and many go unreported, especially if they do not result in severe outcomes [55]. This hinders analysis of surgical safety and improvement efforts. Intraoperative AE detection is more sensitive and accurate but faces logistical, technological, and cost barriers. Deep learning and computer vision algorithms show promise in addressing this challenge.

A systematic review [56] of 13 publications on AI for detecting intraoperative complications, mostly published after 2020, highlighted the novelty of this field. Studies primarily used convolutional neural networks, with video recordings as the main data source, occasionally supplemented by neurophysiological monitoring or digital angiography data. Bleeding was the most commonly detected complication, with some studies successfully identifying perfusion deficits, thermal injuries, and electromyography abnormalities. Surgical specialties included urology, ophthalmology, general surgery, and neurosurgery, indicating the generalizability of these methods. Pooled sensitivity of ten algorithms was 0.78 (0.64–0.88), with a specificity of 0.81 (0.69–0.88).

For AI to effectively identify AEs, it must first recognize the expected sequence of surgical steps. Studies have demonstrated successful identification of surgical phases and outcomes using neural networks, such as tracking laparoscopic instrument positions [57, 58]. This capability enables detection of deviations from standard procedures and assessment of surgical technical skills [55, 59, 60].

A comprehensive solution for surgical safety monitoring is the Operating Room Black Box (ORBB), which evaluates both technical and non-technical skills of the surgical team. Modeled after aviation black boxes, ORBB enables real-time observation, continuous recording, and analysis of intraoperative events for efficiency, safety, and AE detection [61]. ORBB employs multiple computer vision models to generate a series of short video clips and a statistical dashboard. It identifies and segments key procedural phases (dissection, resection, and closure), enabling users to bypass reviewing three- or four-hour recordings and directly access specific moments of the operation, such as instances of significant bleeding or surgical stapler misfires.

The first ORBB prototype, developed under Teodor Grantcharov, was installed at St. Michael’s Hospital in Toronto. A 2020 study analyzed 132 laparoscopic procedures using ORBB, identifying 3435 errors, averaging 20 per procedure [62]. Auditory distractions (e.g., equipment alarms, pagers, phones, instruments) occurred 138 times per case, and the operating room door was opened 42 times per case, approximately every 2 minutes. Surgeons exposed to auditory distractions exhibited lower quality, speed, and accuracy in simulated environments.

ORBB is now installed in at least 40 hospitals across the United States, Canada, and several European countries, including for automated monitoring of WHO surgical safety checklist compliance [63].

For instance, ORBB was implemented at the University of Texas Southwestern Medical Center in Dallas in 2020. A two-year analysis of 4581 procedures evaluated checklist compliance metrics, including completeness, quality, and team engagement across three verification stages. Patients with a checklist quality score of 0 had a predicted mortality of 4.29%, compared to 0.11% for a score of 100 (p < 0.0001). A quality score of 100 reduced predicted hospital stay by 1.57 days. The authors concluded that ORBB offers unprecedented insights into checklist adherence, team engagement, and safety compliance, enabling near-real-time identification and mitigation of safety threats resulting from procedure deviations [63].

Detecting intraoperative complications is critical, but predicting them in real time based on patient status changes is equally important. Currently, risk prediction occurs primarily preoperatively, with intraoperative decisions relying heavily on clinical judgment. Unforeseen risks may lead to reactive rather than preventive responses [61]. GenAI methods can support real-time decision-making by automatically assessing risk from physiological parameters and unstructured data (e.g., text, audio, video), offering more accurate predictions than standard statistical methods [64].

In 2018, Lundberg et al. [65] developed Prescience, an explainable artificial intelligence system predicting hypoxemia during surgery 5 minutes before onset. Prescience monitors vital signs and provides the physician with a real-time risk assessment, updated continuously, while also identifying the reasons for its predictions by listing significant risk factors. This may function as an additional vital sign, alerting the anesthesiologist to current changes in risk.

A brief discussion of explainable AI will follow below. Here, we note that predictive intraoperative models based on explainable AI have also been developed for pediatric cardiac surgery [66] and for predicting hypoxemia in pediatric general surgery [67].

Postoperative Stage

A significant application of GenAI in the postoperative phase is supporting decision-making for postoperative patient management, including assessing risks of complications, disease recurrence, recovery time, and long-term outcomes [68].

Unlike conventional decision support systems, LLMs can continuously learn as new data is processed, ensuring their knowledge base remains current and relevant. For example, at Vanderbilt University Medical Center (Nashville, USA), a comparison was conducted between 36 recommendations generated by ChatGPT and 29 recommendations from 5 experts regarding automated alerts in the center’s medical information system (e.g., warfarin use without INR monitoring, vaccinations in immunocompromised patients, and nonsteroidal anti-inflammatory drugs in pregnancy). Nine of the top 20 recommendations were from ChatGPT, deemed clear, relevant, moderately helpful, with low bias and redundancy [69]. However, one ChatGPT recommendation contained a hallucination (a neural network error producing unverified or fabricated information), and another was partially correct. Consequently, the authors advocate for advancing explainable artificial intelligence technologies to enhance model transparency [70].

A detailed discussion of explainable AI models is beyond this review’s scope, but it is worth noting that the field was kickstarted by the Defense Advanced Research Projects Agency (DARPA) of the U.S. Department of Defense, which is investing over $2 billion in third-generation AI systems capable of either autonomous operation or human collaboration, with the ability to explain their decisions [71].

Explainable AI is particularly relevant in medicine, where physicians bear full responsibility for clinical decisions, regardless of whether they were suggested by AI. Lack of model transparency undermines trust in AI’s diagnostic, therapeutic, and prognostic outputs, as frequently noted in the articles. For instance, a survey of intensive care unit (ICU) physicians found that 71% either disagreed or were uncertain about the reliability of AI for clinical decision-making in the ICU [72].

This skepticism is not entirely unfounded. For example, Dizi et al. [73] developed an AI model to predict in-hospital mortality in ICU patients using the MIMIC-III database without preselecting variables. Detailed analysis revealed that a priest’s visit to a patient was a significant predictor of imminent death. Removing this factor altered the prognosis.

In the MEDLINE database from 2018 to October 2024, 1148 documents were identified with the keyword “explainable artificial intelligence,” of which 109 were surgery-related. Both algorithm-specific and universal methods have been developed to understand how AI models generate predictions. For instance, sequentially removing each risk factor can reveal its impact on the prediction, effectively assigning a weight to each factor. By providing real-time risk assessments with justifications, explainable AI enables surgeons to leverage comprehensive predictive models while retaining the interpretability of logistic regression [74].

The notion of a trade-off between accuracy and explainability in AI models is being reconsidered as explainable AI research expands [75]. In medicine, where AI models are often based on detailed, structured, and pathophysiologically grounded data, the performance gap between interpretable and complex models is typically minimal [74].

Another obvious application for GenAI in surgery is discharge summaries due to their standardized format (like operative reports), which delivers comprehensive information concisely in minimal time [76]. Studies indicate that discharge summary quality is satisfactory in only 20–40% of cases [77], with 40% containing incorrect outpatient medication recommendations [78]. Manual creation of high-quality discharge summaries is time-intensive but essential for care continuity. Improved discharge summary quality has been shown to reduce rehospitalization rates for heart failure exacerbations [79], increase denosumab prescription rates post-hip fracture [80], and enhance patient satisfaction [81].

Key advantages of ChatGPT and other LLMs in generating discharge summaries include consistent, structured, and standardized formatting, clear and concise language, synthesis of complex information, and personalized recommendations [82]. For instance, analysis of ChatGPT-generated discharge summaries for neurosurgical procedures showed 81–85% accuracy [83]. In another study, all 25 ChatGPT-generated discharge summaries were deemed high-quality by experts, compared to 92% of those written by resident physicians [84], with an average data completeness of 97%. Notably, experts correctly identified ChatGPT-authored summaries only 60% of the time.

However, concerns persist about risks of hallucinations in automated documents, data privacy regarding OpenAI’s (ChatGPT) access to user data based on which the texts are generated, and challenges in handling complex or atypical cases [85, 86]. Analysis of physician-authored discharge summaries revealed that only 61% of data came from the current medical record, with 39% drawn from other sources (e.g., referrals, prior records, or other documents) or from physicians’ assumptions or recommendations not documented elsewhere (11%) [85]. This raises doubts about GenAI’s ability to produce high-quality discharge summaries and underscores the need for human oversight to minimize risks and ensure personalized information [76, 85].

Educational and Administrative Tasks

In education, GenAI technologies can be used to create personalized learning pathways, automate homework grading, develop learner profiles with individualized development plans, perform learning analytics with big data, and provide automated recommendations for curriculum redesign [86].

GenAI’s potential in surgical education was demonstrated by the GPT-4 model, which correctly answered 76% of 280 questions on a general surgery resident exam covering all areas of the field [87].

GenAI models can serve as training platforms for honing surgical skills. When learning new technologies or theoretical courses, AI’s ability to analyze performance trends can provide personalized feedback to optimize learning curves [88]. Integrating chatbots with virtual reality enables interactive questioning and feedback to reinforce surgical skills during training [18].

For example, a randomized study at the David Geffen School of Medicine at UCLA used the Osso VR surgical virtual platform to train students with no prior experience in tibial intramedullary osteosynthesis [88]. The training module included written instructions and prompts for each procedural step. After training, participants performed a simulated procedure using a SawBones model. The VR-trained group completed procedures 20% faster on average than the control group, which underwent standard training. The VR group scored significantly higher across all categories (mean score 17.5 vs. 7.5, p < 0.001) and completed a higher percentage of steps correctly (63% vs. 25%, p < 0.002).

A notable achievement is the use of machine learning algorithms to classify surgeon expertise (novice vs. experienced) [89, 90]. GenAI collects and processes data in real time during virtual surgeries, achieving 90% accuracy in determining surgeon proficiency [91].

A systematic review of 93 studies on AI in surgical education [92] highlighted that skill assessment often combines formalized scales, such as the Objective Structured Assessment of Technical Skills or the Global Evaluative Assessment of Robotic Skills, with objective kinematic and physiological metrics, including eye movement (gaze direction, fixation frequency, blink rate, pupil width, vergence), instrument position and tilt, hand movement frequency, force applied to structures, and tissue removal volume—metrics not visually assessable [92]. The review noted that most studies focus on simulation training for laparoscopic and robotic surgeries, which use video cameras to directly capture the surgical field, enabling video storage. In clinical settings during open surgeries, the operative field may be partially or fully obscured by the surgeon’s head or body, and lighting or camera positioning often changes, posing challenges for AI-based monitoring of surgical manipulations.

In surgical process management, machine learning models are primarily used to predict procedure durations and create more accurate operating room schedules. Typically, procedure duration estimates rely on historical averages and surgeon judgment, but these methods lack reliability, leading to schedule disruptions, cancellations, or delays. Studies have shown that neural networks and other machine learning models, using administrative and perioperative data, improve procedure duration prediction accuracy by 24–35% [93–95], increase schedule adherence to 90% [95], and optimize staff and resource allocation [96]. AI models have also been used to predict recovery room stays (66–82% accuracy) and procedure cancellation risks (68–72% accuracy) [97].

LLMs can streamline documentation for insurance companies and other routine administrative tasks [98], support disease coding, plan and prepare physician consultations [99], assist in creating nursing care plans [100], and serve as reminders for scheduled tasks, reducing administrative burdens and freeing time for clinical work [101].

Key Applications of Generative Artificial Intelligence in Traumatology and Orthopedics

The successful integration of GenAI into traumatology and orthopedics is facilitated by several characteristics of the discipline: well-established diagnostic and treatment algorithms for most orthopedic conditions, the reproducibility and high efficacy of orthopedic interventions, and the availability of national and international databases with extensive procedural data, which can serve as a foundation for developing AI applications [102].

All GenAI technologies and multimodal models described previously are applicable to traumatology and orthopedic interventions. This section focuses on specific examples of GenAI applications in these fields.

Radiographic imaging is central to diagnosing musculoskeletal injuries and diseases, making it a longstanding area for AI application in traumatology and orthopedics. Early uses included AI algorithms for detecting bone tumors, assessing bone mineral density, and analyzing trabecular structures in long bones. More recent advancements involve neural networks for identifying fractures on radiographs and CT scans, as well as meniscal injuries, ligament tears, and bone marrow edema on MRI [103]. Discriminative AI models achieve 72–100% accuracy in diagnosing and classifying fractures of long bones, vertebrae, ribs, intra-articular and periprosthetic fractures, and endoprosthesis instability [104, 105].

Generative and hybrid neural networks are used to create pseudo-CT images from MRI data or enhance low-dose CT scans through noise reduction, reducing radiation exposure, accelerating diagnosis, and improving accuracy [106]. Conversely, CT images can be transformed into MRI-like images for spinal pathology diagnosis when MRI is contraindicated or to reduce diagnostic costs [107]. AI models also classify scoliosis type and severity based on torso topology with up to 86% accuracy, minimizing radiation exposure [108].

A second promising application of GenAI, particularly deep learning neural networks, is in planning orthopedic interventions. AI-based software for creating and utilizing orthopedic templates can rapidly assess joint morphology and select optimal endoprosthesis sizes, achieving 90% accuracy. Studies show that AI-assisted planning reduces manual adjustments to surgical plans by an average of 40% [109]. In spinal surgery, AI models guide the selection of transpedicular screw trajectories and provide intraoperative monitoring using CT and MRI data. Manual screw trajectory planning is error-prone and time-consuming, whereas neural networks accomplish this task in seconds or fractions of a second [110]. During surgery, integrating GenAI with navigation systems reduces radiation exposure by 50% compared to conventional fluoroscopy [111].

Automated and semi-automated robotic surgical systems equipped with AI provide real-time feedback to surgeons and ensure adherence to preoperative plans. For instance, the MAKO robotic system improves knee endoprosthesis placement accuracy by 32% compared to conventional methods and enhances limb alignment [112].

Another intraoperative application of GenAI in musculoskeletal surgery is augmented reality. In spinal surgery, it can enhance surgical navigation and implant placement (e.g., transpedicular screws), reduces surgery time, and lowers procedural error rates, particularly for novice surgeons, from 7% to 2% [113].

GenAI shows significant potential in the rehabilitation of traumatology and orthopedic patients. Patient adherence to rehabilitation protocols typically does not exceed 50% [114]. AI-based virtual assistants can provide personalized exercise plans, track progress, offer real-time feedback, and adapt rehabilitation programs to individual needs and functional recovery rates [115]. For example, the Exer Health application (Exer Labs, USA) (https://www.exer.ai/) uses AI to monitor patients’ physical activity and exercise performance at home. Installed on a smartphone or tablet, it employs computer vision to measure movement range and trajectory during exercises. These data are sent to the orthopedist as a treatment progress report, while patients receive daily exercise reminders.

Risks of Applying Generative Artificial Intelligence in Medicine

The World Health Organization’s guidance on ethics and governance of large multimodal AI models notes that the adoption rate of LLM-based technologies surpasses that of any other consumer software in history [7]. However, this rapid expansion carries risks.

A common risk in medical applications of GenAI is hallucinations—the generation of incorrect or nonsensical information that appears plausible but is not grounded in reality or training data. To mitigate this, mechanisms for automated cross-verification with standard medical guidelines and review of all model-generated text are recommended [25]. All publications emphasize that neural networks cannot replace surgeons in decision-making but serve as supplementary tools.

Another challenge is the availability of large, annotated datasets required for training models. Most datasets cover common conditions and are generated within single or a limited number of institutions. Training on non-representative data, i.e., lacking diversity in populations, diagnoses, imaging modalities, or healthcare settings, can compromise model performance and reliability. For example, analysis of 480,000 pelvic radiographs using a GenAI model identified six differences in bone structure between Caucasian and African populations, including acetabular distance, osteoarthritis severity, obturator foramen shape, femoral neck-shaft angle, pelvic ring shape, and femoral cortical thickness [116]. Such differences can distort diagnostic and prognostic imaging analyses.

Addressing systematic biases requires collaboration among healthcare organizations to share data and create publicly accessible datasets, assess sample representativeness, and employ algorithms to correct result distortions. A technological solution is generative adversarial networks (GANs), which combine two neural networks: one (generator, G) creates samples, while the other (discriminator, D) distinguishes correct (“authentic”) samples from incorrect ones. Their opposing objectives create an antagonistic dynamic that improves output accuracy. In traumatology and orthopedics, GANs are used for diagnosing fractures, osteoarthritis, and other conditions, showing promising results [117].

However, data availability raises ethical concerns about protecting patient information privacy, particularly with publicly accessible LLMs like ChatGPT. If models use patient data (e.g., medical records), this information is automatically stored, risking unauthorized access, re-identification, or leaks. Analysis of data from the U.S. National Health and Nutrition Examination Survey (NHANES) showed that re-identification was possible for 85.6% of adults and 69.8% of children, despite claimed de-identification [118]. Inadequate anonymization methods can delay or halt projects due to the complex and potentially unreliable de-identification process. Simplified anonymization processes and robust de-identification algorithms are needed to ensure data security.

Significant barriers to integrating GenAI into clinical practice include seamless integration with existing medical infrastructures and high development and implementation costs. A 2022 analysis by the U.S. National Bureau of Economic Research [119] estimated that broader AI adoption could save 5–10% of U.S. healthcare expenditures, i.e., approximately $200–360 billion annually in 2019 dollars. These estimates are based on specific AI use cases implementable within five years without compromising care quality or access. Such applications could also yield non-financial benefits, including improved care quality, expanded service access, enhanced patient experiences, and increased physician satisfaction [119].

This review does not cover GenAI applications in scientific research or patient-facing tools, though these areas also hold significant potential for development in the coming years.

CONCLUSION

Artificial intelligence is increasingly positioned as a transformative technology capable of elevating clinical practice to new heights. In traumatology, orthopedics, and surgery broadly, GenAI can be applied at all stages of care—from diagnosis to home-based rehabilitation monitoring.

However, most publications on GenAI in surgery, traumatology, and orthopedics describe pilot implementations or laboratory experiments. There is a notable lack of data on routine clinical integration, sustained use, acceptability, effectiveness, and economic impact. Economic studies suggest that initial development and implementation costs for AI technologies could be recouped within five years without compromising care quality, yielding significant non-financial benefits.

As outlined in Russia’s National AI Development Strategy, the country has substantial potential to become a global leader in AI development and application, supported by strong mathematical and scientific education, expertise in modeling, and programming.

The application of GenAI requires balanced approaches and collaborative efforts from all stakeholders to translate theoretical concepts into practical clinical solutions. Further research grounded in evidence-based medicine is essential to ensure the safe and effective integration of GenAI into clinical practice.

ADDITIONAL INFO

Author contribution. A.G. Nazarenko — development of system requirements, editing of the article; E.B. Kleymenova — requirements for review structure, editing of the article; A.I. Molodchenkov — search and selection of literature sources, editing of the article; N.M. Kakabadze — literature review, paper preparation; L.P. Yashina — search and analysis of literature, writing the article. Thereby, all authors provided approval of the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Funding source. This work was supported by the Russian Science Foundation (RSF project No. 24-14-00310).

Disclosure of interests. The authors declare the absence of relationships, activities and interests (personal, professional or financial) related to third parties (commercial, non-profit, private), whose interests may be affected by the content of the article, as well as other relationships, activities and interests over the past three years, which must be reported.

Provenance and peer-review. This paper was submitted to the journal on an initiative basis and reviewed according to the usual procedure. Two external reviewers, a member of the editorial board and the scientific editor of the publication participated in the review.

 

1 Approved by Decree of the President of the Russian Federation No. 490 dated October 10, 2019 (as amended by Decree of the President of the Russian Federation No. 124 dated February 15, 2024).

2 https://ru.ruwiki.ru/wiki/Generative_pre-trained_transformer#mw-header

3 https://blog.skillfactory.ru/glossary/nlp/

4 https://habr.com/ru/companies/skillfactory/articles/837366/

5 National Strategy for the Development of Artificial Intelligence for the Period Until 2030. Approved by Decree of the President of the Russian Federation No. 490 dated October 10, 2019 (as amended by Decree of the President of the Russian Federation No. 124 dated February 15, 2024).

6 https://ru.ruwiki.ru/wiki/Generative_pre-trained_transformer#cite_ref-gpt1paper_1-0

7 https://explodingtopics.com/blog/chatgpt-users

8 https://руни.рф/Виртуальный_собеседник

×

About the authors

Anton G. Nazarenko

Priorov National Medical Research Center for Traumatology and Orthopedics

Email: NazarenkoAG@cito.priorov.ru
ORCID iD: 0000-0003-1314-2887
SPIN-code: 1402-5186

MD, Dr. Sci. (Medicine), рrofessor RAS

Russian Federation, 9 Novospassky per., 115172 Moscow

Elena B. Kleimenova

Priorov National Medical Research Center for Traumatology and Orthopedics

Email: KleymenovaEB@cito-priorov.ru
ORCID iD: 0000-0002-8745-6195
SPIN-code: 2037-7164

MD, Dr. Sci. (Medicine), рrofessor

Russian Federation, 9 Novospassky per., 115172 Moscow

Nodari M. Kakabadze

Priorov National Medical Research Center for Traumatology and Orthopedics

Email: KakabadzeNM@cito-priorov.ru
ORCID iD: 0000-0002-2380-2394
SPIN-code: 6321-6733
Russian Federation, 9 Novospassky per., 115172 Moscow

Alexey I. Molodchenkov

Federal Research Center «Computer Science and Control» of the Russian Academy of Sciences; Peoples’ Friendship University of Russia

Email: aim@isa.ru
ORCID iD: 0000-0003-0039-943X
SPIN-code: 3378-7234

Cand. Sci. (Engineering)

Russian Federation, Moscow; Moscow

Liubov P. Yashina

Priorov National Medical Research Center for Traumatology and Orthopedics

Author for correspondence.
Email: YashinaLP@cito-priorov.ru
ORCID iD: 0000-0003-1357-0056
SPIN-code: 1910-0484

Cand. Sci. (Biology)

Russian Federation, 9 Novospassky per., 115172 Moscow

References

  1. Goldman S. AI investment forecast to approach $200 billion globally by 2025. [2023 Aug 1]. Available from: https://www.goldmansachs.com/insights/articles/ai-investment-forecast-to-approach-200-billion-globally-by-2025 Accessed: 14.10.2024.
  2. Gartner says more than 80% of enterprises will have used generative AI APIs or deployed generative AI-enabled applications by 2026. [2023 Oct 11]. Available from: https://www.gartner.com/en/newsroom/press-releases/2023-10-11-gartner-says-more-than-80-percent-of-enterprises-will-have-used-generative-ai-apis-or-deployed-generative-ai-enabled-applications-by-2026 Accessed: 15.10.2024.
  3. Masyuk D, Sergienko Ya. Artificial Intelligence in Russia — 2023: Trends and Prospects. Moscow: Yakov and Partners; 2023. 80 p. Available from: https://yakovpartners.ru/upload/iblock/c5e/c8t1wrkdne5y9a4nqlicderalwny7xh4/20231218_AI_future.pdf (in Russ.)
  4. Bastian M. GPT-4 has more than a trillion parameters. [2023 Mar 25]. Available from: https://the-decoder.com/gpt-4-has-a-trillion-parameters/#summary
  5. WHO Guideline: Recommendations on digital interventions for health system strengthening. Geneva: WHO; 2019. 124 p. Available from: https://www.who.int/reproductivehealth/publications/digital-interventions-health-system-strengthening/en.
  6. WHO guidance. Ethics and governance of artificial intelligence for health. Geneva: World Health Organization; 2021. 165 р. Available from: https://www.who.int/publications/i/item/9789240029200
  7. WHO guidance. Ethics and governance of artificial intelligence for health. Guidance on large multi-modal models. Geneva: World Health Organization; 2024. 98 р. Available from: https: //www.who.int/publications/i/item /9789240084759
  8. GOST R 59277-2020. Artificial intelligence systems. Classification of artificial intelligence systems (appr. 23.12.2020). Moscow: Standartinform; 2021 (in Russ.)
  9. Muhammad A, Vissa S. Using the capabilities of hybrid generative-discriminative models. [2024 Apr 17]. Available from: https://skine.ru/articles/737238.
  10. Generative AI vs. discriminative AI: understanding the key differences. [2024 Jul 24]. Available from: https://www.geeksforgeeks.org/difference-between-generative-ai-and-discriminative-ai
  11. Osipov YuS, editor. Neural network. The Great Russian Encyclopedia: [in 35 volumes]. Moscow: The Great Russian Encyclopedia; 2004–2017. (in Russ.)
  12. Zsidai B, Kaarre J, Narup E, et al.; ESSKA Artificial Intelligence Working Group. A practical guide to the implementation of artificial intelligence in orthopaedic research-Part 2: A technical introduction. J Exp Orthop. 2024;11(3):e12025. doi: 10.1186/s40634-023-00662-4
  13. Moulaei K, Yadegari A, Baharestani M, et al. Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications. Int J Med Inform. 2024;188:105474. doi: 10.1016/j.ijmedinf.2024.105474
  14. Manning CD. Human language understanding and reasoning. Daedalus. 2022;151(2):127–138. doi: 10.1162/DAED_a_01905
  15. Rodler S, Ganjavi C, Backerd PD. Generative artificial intelligence in surgery. Surgery. 2024;175(6):1496–1502. doi: 10.1016/j.surg.2024.02.019
  16. Bugaj M, Kliestik T, Lăzăroiu G. Generative artificial intelligence-based diagnostic algorithms in disease risk detection, in personalized and targeted healthcare procedures, and in patient care safety and quality. Contemp Read Law Soc. Justice. 2023;15:9–26.
  17. Tanveer SA, Fatima B, Ghafoor R. Diagnostic accuracy of artificial intelligence versus manual detection in marginal bone loss around fixed prosthesis. a systematic review. J Pak Med Assoc. 2024;74(4 Suppl):S37–S42. doi: 10.47391/JPMA.AKU-9S-06
  18. Michelutti L, Tel A, Zeppieri M, et al. Generative adversarial networks (GANs) in the field of head and neck surgery: current evidence and prospects for the future-a systematic review. J Clin Med. 2024;13(12):3556. doi: 10.3390/jcm13123556
  19. Schukow C, Smith SC, Landgrebe E, et al. Application of ChatGPT in routine diagnostic pathology: promises, pitfalls, and potential future directions. Adv Anat Pathol. 2023;31(1):15–21. doi: 10.1097/PAP.0000000000000406
  20. Haemmerli J, Sveikata L, Nouri A, et al. ChatGPT in glioma adjuvant therapy decision making: Ready to assume the role of a doctor in the tumor board? BMJ Health Care Inform. 2023;30(1):e100775. doi: 10.1136/bmjhci-2023-100775
  21. Chen TC, Kaminski E, Koduri L, et al. Chat GPT as a Neuro-score Calculator: Analysis of a large language model’s performance on various neurological exam grading scales. World Neurosurg. 2023;179:e342–e347. doi: 10.1016/j.wneu.2023.08.088
  22. Rizwan A, Sadiq T. The use of AI in diagnosing diseases and providing management plans: a consultation on cardiovascular disorders with ChatGPT. Cureus. 2023;15(8):e43106. doi: 10.7759/cureus.43106
  23. Gala D, Makaryus AN. The utility of language models in cardiology: a narrative review of the benefits and concerns ofChatGPT-4. Int J Environ Res Public Health. 2023;20(15):6438. doi: 10.3390/ijerph20156438
  24. Buga M, Kliestik T, Lăzăroiu G. Generative artificial intelligence-based diagnostic algorithms in disease risk detection, in personalized and targeted healthcare procedures, and in patient care safety and quality. Contemp Read Law Soc Justice. 2023;15:9–26.
  25. Eppler MB, Ganjavi C, Knudsen JE, et al. Bridging the gap between urological research and patient understanding: the role of large language models in automated generation of layperson’s summaries. Urol Pract. 2023;10(5):436e443. doi: 10.1097/UPJ.0000000000000428
  26. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Trans. Benchmarks Stand Eval. 2023;3:100105. doi: 10.1016/j.tbench.2023.100105
  27. Sharma SC, Ramchandani JP, Thakker A, Lahiri A. ChatGPT in plastic and reconstructive surgery. Indian J Plast Surg. 2023;56(4):320–325. doi: 10.1055/s-0043-1771514
  28. Aljindan FK, Shawosh MH, Altamimi L, et al. Utilization of ChatGPT-4 in plastic and reconstructive surgery: a narrative review. Plast Reconstr Surg Glob Open. 2023;11(10):e5305. doi: 10.1097/GOX.0000000000005305
  29. Zhou XY, Guo Y, Shen M, Yang GZ. Application of artificial intelligence in surgery. Front Med. 2020;14(4):417–430. doi: 10.1007/s11684-020-0770-0
  30. Zhou XY, Yang GZ, Lee SL. A real-time and registration-free framework for dynamic shape instantiation. Med Image Anal. 2018;44:86–97. doi: 10.1016/j.media.2017.11.009
  31. van der Stap N, van der Heijden F, Broeders IA. Towards automated visual flexible endoscope navigation. Surg Endosc. 2013;27(10):3539–3547. doi: 10.1007/s00464-013-3003-7
  32. Shen M, Gu Y, Liu N, Yang GZ. Context-aware depth and pose estimation for bronchoscopic navigation. IEEE Robot Autom Lett. 2019;4(2):732–739. doi: 10.1109/LRA.2019.2893419
  33. Ozyoruk KB, Gokceler GI, Bobrow TL, et al. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal. 2021;71:102058. doi: 10.1016/j.media.2021.102058
  34. Yang Z, Lin S, Simon R, Linte CA. Endoscope Localization and Dense Surgical Scene Reconstruction for Stereo Endoscopy by Unsupervised Optical Flow and Kanade-Lucas-Tomasi Tracking. Annu Int Conf IEEE Eng Med Biol Soc. 2022;2022:4839–4842. doi: 10.1109/EMBC48229.2022.9871588
  35. Song J, Wang J, Zhao L, et al. MIS-SLAM: real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing. IEEE Robot Autom Lett. 2018;3(4):4068–4075. doi: 10.1109/lra.2018.2856519
  36. Turan M, Almalioglu Y, Araujo H, et al. Deep endovo: a recurrent convolutional neural network (RCNN) based visual odometry approach for endoscopic capsule robots. Neurocomputing. 2018;275:1861–1870.
  37. Kalfas IH. Machine vision navigation in spine surgery. Front Surg. 2021;8:1–7. doi: 10.3389/fsurg.2021.640554
  38. Massalimova A, Timmermans M, Esfandiari H, et al. Intraoperative tissue classification methods in orthopedic and neurological surgeries: A systematic review. Front Surg. 2022;9:952539. doi: 10.3389/fsurg.2022.952539
  39. Timmermans M, Massalimova A, Li R, et al. State-of-the-art of non-radiative, non-visual spine sensing with a focus on sensing forces, vibrations and bioelectrical properties: a systematic review. Sensors (Basel). 2023;23(19):8094. doi: 10.3390/s23198094
  40. Wang J, Suenaga H, Hoshi K, et al. Augmented reality navigation with automatic marker-free image registration using 3-D image overlay for dental surgery. IEEE Trans Biomed Eng. 2014;61(4):1295–1304. doi: 10.1109/TBME.2014.2301191
  41. Gouveia PF, Costa J, Morgado P, et al. Breast cancer surgery with augmented reality. Breast. 2021;56:14–17. doi: 10.1016/j.breast.2021.01.004
  42. Carl B, Bopp M, Benescu A, Saß B, Nimsky C. Indocyanine green angiography visualized by augmented reality in aneurysm surgery. World Neurosurg. 2020;142:e307–e315. doi: 10.1016/j.wneu.2020.06.219
  43. Carl B, Bopp M, Saß B, Nimsky C. Microscope-based augmented reality in degenerative spine surgery: initial experience. World Neurosurg. 2019;128:e541–e551. doi: 10.1016/j.wneu.2019.04.192
  44. Edström E, Burström G, Omar A, et al. Augmented reality surgical navigation in spine surgery to minimize staff radiation exposure. Spine. 2020;45(1):E45–E53. doi: 10.1097/BRS.0000000000003197
  45. Zhang X, Wang J, Wang T, et al. A markerless automatic deformable registration framework for augmented reality navigation of laparoscopy partial nephrectomy. Int J CARS. 2019;14(8):1285–1294. doi: 10.1007/s11548-019-01974-6
  46. Luo H, Yin D, Zhang S., et al. Augmented reality navigation for liver resection with a stereoscopic laparoscope. Comput Methods Programs Biomed. 2020;187:105099. doi: 10.1016/j.cmpb.2019.105099
  47. Lefter LP, Walker SR, Dewhurst F, Turner RW. An audit of operative notes: facts and ways to improve. ANZ J Surg. 2008;78:800–2. doi: 10.1111/j.1445-2197.2008.04654.x
  48. Abdelhady AM, Davis CR. Plastic surgery and artificial intelligence: how ChatGPT improved operation note accuracy, time, and education. Mayo Clinic Proceedings: Digital Health. 2023;1:299e308. doi: 10.1016/j.mcpdig.2023.06.002
  49. Singh S, Djalilian A, Al MJ. ChatGPT and ophthalmology: exploring its potential with discharge summaries and operative notes. Semin Ophthalmol. 2023;38(5):503–507. doi: 10.1080/08820538.2023.2209166
  50. Waisberg E, Ong J, Masalkhi M, et al. GPT-4 and ophthalmology operative notes. Ann Biomed Eng. 2023;51(11):2353–2355. doi: 10.1007/s10439-023-03263-5.
  51. Ali A, Kumar RP, Polavarapu H, et al. Bridging the gap: can large language models match human expertise in writing neurosurgical operative notes? World Neurosurg. 2024:192:e34–e41. doi: 10.1016/j.wneu.2024.08.062
  52. Dubinski D, Won SY, Trnovec S, et al. Leveraging artificial intelligence in neurosurgery-unveiling ChatGPT for neurosurgical discharge summaries and operative reports. Acta Neurochir (Wien). 2024;166(1):38. doi: 10.1007/s00701-024-05908-3
  53. Robinson A, Aggarwal SJr. When precision meets penmanship: ChatGPT and surgery documentation. Cureus. 2023;15(6):e40546. doi: 10.7759/cureus.40546
  54. Arif F. ChatGPT for the standardized operative notes in plastic surgery. J Liaquat Nat. Hosp. 2023;1(2):111–113.
  55. Jung JJ, Elfassy J, Jüni P, Grantcharov T. Adverse events in the operating room: Definitions, prevalence, and characteristics. A systematic review. World J Surg. 2019;43(10):2379–2392. doi: 10.1007/s00268-019-05048-1
  56. Eppler MB, Sayegh AS, Maas M, et al. Automated capture of intraoperative adverse events using artificial intelligence: a systematic review and meta-analysis. J Clin Med. 2023;12(4):1687. doi: 10.3390/jcm12041687
  57. Yamazaki Y, Kanaji S, Matsuda T, et al. Automated surgical instrument detection from laparoscopic gastrectomy video images using an open source convolutional neural network platform. J Am Col. Surg. 2020;230(5):725–732.e1. doi: 10.1016/j.jamcollsurg.2020.01.037
  58. Hegde SR, Namazi B, Iyengar N, et al. Automated segmentation of phases, steps, and tasks in laparoscopic cholecystectomy using deep learning. Surg Endosc. 2024;38(1):158–170. doi: 10.1007/s00464-023-10482-3
  59. Lee D, Yu HW, Kwon H, et al. Evaluation of surgical skills during robotic surgery by deep learning-based multiple surgical instrument tracking in training and actual operations. J Clin Med. 2020;9:1964. doi: 10.3390/jcm9061964
  60. Cacciamani GE, Anvar A, Chen A, et al. How the use of the artificial intelligence could improve surgical skills in urology: State of the art and future perspectives. Curr Opin Urol. 2021;31(4):378–384. doi: 10.1097/MOU.0000000000000890
  61. Peregrin T. Black box technology shines light on improving OR safety, efficiency. ACS Bulletin. 2023;108(7):17–23.
  62. Jung JJ, Jüni P, Lebovic G, Grantcharov T. First-year analysis of the operating room black box study. Ann Surg. 2020;271(1):122–127. doi: 10.1097/SLA.0000000000002863
  63. Al Abbas AI, Meier J, Daniel W, et al. Impact of team performance on the surgical safety checklist on patient outcomes: an operating room black box analysis. Surg Endosc. 2024;38:5613–5622 doi: 10.1007/s00464-024-11064-7
  64. Gordon L, Grantcharov T, Rudzicz F. Explainable artificial intelligence for safe intraoperative decision support. JAMA Surg. 2019;154(11):1064–1065. doi: 10.1001/jamasurg.2019.2821
  65. Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2(10):749–760. doi: 10.1038/s41551-018-0304-0
  66. Zeng X, Hu Y, Shu L. et al. Explainable machine-learning predictions for complications after pediatric congenital heart surgery. Sci Rep. 2021;11:17244. doi: 10.1038/s41598-021-96721-w
  67. Park J-B, Lee H-J, Yang H-L, et al. Machine learning-based prediction of intraoperative hypoxemia for pediatric patients. PLoS ONE. 2023;18(3):e0282303. doi: 10.1371/journal.pone.0282303
  68. Bektaş M, Pereira JK, Daams F, et al. ChatGPT in surgery: a revolutionary innovation? Surg. Today. 2024;54:964–971. doi: 10.1007/s00595-024-02800-6
  69. Liu S, Wright AP, Patterson BL, et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc. 2023;30(7):1237–1245. doi: 10.1093/jamia/ocad072
  70. Liu S, McCoy AB, Peterson JF, et al. Leveraging explainable artificial intelligence to optimize clinical decision support. J Am Med Inform Assoc. 2024;31(4):968–974. doi: 10.1093/jamia/ocae019
  71. Averkin AN. Explainable artificial intelligence as part of the third generation artificial intelligence. Speech technologies. 2023;(1):4–10. (In Russ.) EDN: ONAIEY
  72. Van De Sande D, Van Genderen ME, Braaf H, et al. Moving towards clinical use of artificial intelligence in intensive care medicine: business as usual? Intensive Care Med. 2022;48(12):1815–7. doi: 10.1007/s00134-022-06910-y
  73. Deasy J, Liò P, Ercole A. Dynamic survival prediction in intensive care units from heterogeneous time series without the need for variable selection or curation. Sci Rep. 2020;10(1):22129. doi: 10.1038/s41598-020-79142-z
  74. Abgrall G, Holder AL, Chelly Dagdia Z, et al. Should AI models be explainable to clinicians? Crit Care. 2024;28:301. doi: 10.1186/s13054-024-05005-y
  75. Savage N. Breaking into the black box of artificial intelligence. Nature. 2022. doi: 10.1038/d41586-022-00858-1
  76. Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5(3):e107–e108. doi: 10.1016/S2589-7500(23)00021-3
  77. Earnshaw CH, Pedersen A, Evans J. Improving the quality of discharge summaries through a direct feedback system. Future Healthc J. 2020;7(2):149–154. doi: 10.7861/fhj.2019-0046
  78. Unnewehr M, Schaaf B, Marev R, et al. Optimizing the quality of hospital discharge summaries — a systematic review and practical tools. Postgraduate Med. 2015;127(6):630–639. doi: 10.1080/00325481.2015.1054256
  79. Salim Al-Damluji M, Dzara K, Hodshon B, et al. Association of discharge summary quality with readmission risk for patients hospitalized with heart failure exacerbation. Circ Cardiovasc Qual Outcomes. 2015;8:109–111. doi: 10.1161/CIRCOUTCOMES.114.001476
  80. Wood H, Lewis H, Ward R, et al. Improving community prescribing of post-fracture denosumab after discharge. Br J Hosp Med. 2017;78(1):20–22. doi: 10.12968/hmed.2017.78.1.20
  81. Carter A, Warner E, Roberton A, et al. Tonsillectomy discharge information-An improvement in both patient safety and satisfaction. BMJ Open Quality. 2014;2(2):u203433-u2w1546. doi: 10.1136/bmjquality.u203433.w1546
  82. Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med. 2023;388(13):1233–1239. doi: 10.1056/NEJMsr2214184
  83. Dubinski D, Won SY, Trnovec S, et al. Leveraging artificial intelligence in neurosurgery — unveiling ChatGPT for neurosurgical discharge summaries and operative reports. Acta Neurochir. 2024;166:38. doi: 10.1007/s00701-024-05908-3
  84. Clough RAJ, Sparkes WA, Clough OT, et al. Transforming healthcare documentation: harnessing the potential of AI to generate discharge summaries. BJGP Open. 2024;8(1):BJGPO.2023.0116. doi: 10.3399/BJGPO.2023.0116
  85. Ando K, Okumura T, Komachi M, et al. Is artificial intelligence capable of generating hospital discharge summaries from inpatient records? PLOS Digit Health. 2022;1(12):e0000158. doi: 10.1371/journal.pdig.0000158
  86. Konstantinova LV, Vorozhikhin VV, Petrov AM, Titova ES. Generative Artificial Intelligence: Challenges for Traditional Education: Results of Monitoring Information on Trends in the Development of Higher Education in the World and in Russia. Moscow: Plekhanov Russian University of Economics; 2023. 85 р. (In Russ.)
  87. Oh N, Choi GS, Lee WY. ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models. Ann Surg Treat Res. 2023;104(5):269–273. doi: 10.4174/astr.2023.104.5.269
  88. Blumstein G, Zukotynski B, Cevallos N, et al. Randomized trial of a virtual reality tool to teach surgical technique for tibial shaft fracture intramedullary nailing. J Surg Educ. 2020;77:969–977. doi: 10.1016/j.jsurg.2020.01.002
  89. Mirchi N, Bissonnette V, Ledwos N, et al. Artificial neural networks to assess virtual reality anterior cervical discectomy performance. Oper Neurosurg (Hagerstown). 2019;19:65–75. doi: 10.1093/ons/opz359
  90. Guerrero DT, Asaad M, Rajesh A, et al. Advancing surgical education: the use of artificial intelligence in surgical training. Am Surg. 2023;89(1):49–54. doi: 10.1177/00031348221101503
  91. Winkler-Schwartz A, Yilmaz R, Mirchi N, et al. Machine learning identification of surgical and operative factors associated with surgical expertise in virtual reality simulation. JAMA Netw Open. 2019;2:e198363. doi: 10.1001/jamanetworkopen.2019.8363
  92. Bilgic E, Gorgy A, Yang A, et al. Exploring the roles of artificial intelligence in surgical education: A scoping review. Am J Surg. 2022;224(1 Pt A):205–216. doi: 10.1016/j.amjsurg.2021.11.023
  93. Caserta M, Romero AG. A novel approach to forecast surgery durations using machine learning techniques. Health Care Manag Sci. 2024;27:313–327. doi: 10.1007/s10729-024-09681-8
  94. ShahabiKargar Z, Khanna S., Good N, et al. Predicting procedure duration to improve scheduling of elective surgery. Lecture Notes in Computer Science. 2014; 8862:998–1009. doi: 10.1007/978-3-319-13560-1_86
  95. Hassanzadeh H, Boyle J, Khanna S, Biki B, Syed F. Daily surgery caseload prediction: towards improving operating theatre efficiency. BMC Med Inform Decis Mak. 2022;22(1):151. doi: 10.1186/s12911-022-01893-8
  96. Riahi V, Hassanzadeh H, Khanna S, et al. Improving preoperative prediction of surgery duration. BMC Health Serv Res. 2023;23:1343. doi: 10.1186/s12913-023-10264-6
  97. Bellini V, Russo M, Domenichetti T, et al. Artificial intelligence in operating room management. J Med Syst. 2024;48:19. doi: 10.1007/s10916-024-02038-2
  98. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Trans. Benchmarks Stand Eval. 2023;3(1):100105. doi: 10.1016/j.tbench.2023.100105
  99. Letourneau-Guillon L, Camirand D, Guilbert F, Forghani R. Artificial intelligence applications for workflow, process optimization and predictive analytics. Neuroimaging Clin N Am. 2020;30(4):e1–e15. doi: 10.1016/j.nic.2020.08.008
  100. Dos Santos FC, Johnson LG, Madandola OO, et al. An example of leveraging AI for documentation: ChatGPT-generated nursing care plan for an older adult with lung cancer. J Am Med Inform Assoc. 2024;31(9):2089–2096. doi: 10.1093/jamia/ocae116
  101. Pressman SM, Borna S, Gomez-Cabello CA, et al. Clinical and surgical applications of large language models: a systematic review. J Clin Med. 2024;13(11):3041. doi: 10.3390/jcm13113041
  102. Mehdian R, Howard M. Artificial intelligence in trauma and orthopedics. In: Artificial intelligence in medicine. Lidströmer N, Ashrafian H, editors. Springer Nature Switzerland; 2022. Р. 873–886. doi: 10.1007/978-3-030-64573-1_256
  103. Pankhania M. Artificial intelligence in musculoskeletal radiology: past, present, and future. Indian J Musculoskelet Radiol. 2020;2(2):89–96. doi: 10.25259/IJMSR_62_2020
  104. Ajmera P, Kharat A, Botchu R, et al. Real-world analysis of artificial intelligence in musculoskeletal trauma. J Clin Orthop Trauma. 2021;22:101573. doi: 10.1016/j.jcot.2021.101573
  105. Kurmis AP, Ianunzio JR. Artificial intelligence in orthopedic surgery: evolution, current state and future directions. Arthroplasty. 2022;4(1):9. doi: 10.1186/s42836-022-00112-z
  106. Merali ZA, Colak E, Wilson JR. Applications of machine learning to imaging of spinal disorders: current status and future directions. Global Spine J. 2021;11(1_suppl):23S–29S. doi: 10.1177/2192568220961353
  107. Dai G, Su J, Zhang M, et al. A novel structure preserving generative adversarial network for CT to MR modality translation of spine. Neural Comput Applic. 2024;36:4101–4114. doi: 10.1007/s00521-023-09254-w
  108. Komeili A, Westover L, Parent EC, et al. Monitoring for idiopathic scoliosis curve progression using surface topography asymmetry analysis of the torso in adolescents. Spine J. 2015;15(4):743–751. doi: 10.1016/j.spinee.2015.01.018
  109. Anwar A, Zhang Y, Zhang Z, Li J. Artificial intelligence technology improves the accuracy of preoperative planning in primary total hip arthroplasty. Asian J Surg. 2024;47(7):2999–3006. doi: 10.1016/j.asjsur.2024.01.133
  110. Abel F, Lebl DR, Gorgy G, et al. Deep-learning reconstructed lumbar spine 3D MRI for surgical planning: pedicle screw placement and geometric measurements compared to CT. Eur Spine J. 2024;33(11):4144–4154. doi: 10.1007/s00586-023-08123-3
  111. Racadio JM, Nachabe R, Homan R, et al. Augmented reality on a C-arm system: a preclinical assessment for percutaneous needle localization. Radiology. 2016;281(1):249–255. doi: 10.1148/radiol.2016151040
  112. Nam CH, Lee SC, Kim JH, et al. Robot-assisted total knee arthroplasty improves mechanical alignment and accuracy of component positioning compared to the conventional technique. J Exp Orthop. 2022;9(1):108. doi: 10.1186/s40634-022-00546-z
  113. Dennler C, Jaberg L, Spirig J, et al. Augmented reality-based navigation increases precision of pedicle screw insertion. J Orthop Surg Res. 2020;15(1):174. doi: 10.1186/s13018-020-01690-x
  114. Holden MA, Haywood KL, Potia TA, et al. Recommendations for exercise adherence measures in musculoskeletal settings: a systematic review and consensus meeting. Syst Rev. 2014;3(1):10. doi: 10.1186/2046-4053-3-10
  115. McDermott ER, DeFoor MT, Dekker TJ, DePhillipo NN. Artificial intelligence in rehabilitation. In: Artificial intelligence in orthopaedic surgery made easy. Familiari F, Galasso O, Gasparini G, editors. Springer; 2024. Р. 197–204. doi: 10.1007/978-3-031-70310-2_15
  116. Khosravi B, Rouzrokh P, Erickson BJ, et al. Analyzing racial differences in imaging joint replacement registries using generative artificial intelligence: advancing orthopaedic data equity. Arthroplast Today. 2024;29:101503. doi: 10.1016/j.artd.2024.101503
  117. Ahn G, Choi BS, Ko S, et al. High-resolution knee plain radiography image synthesis using style generative adversarial network adaptive discriminator augmentation. J Orthop Res. 2023;41(1):84–93. doi: 10.1002/jor.25325
  118. Na L, Yang C, Lo CC, et al. Feasibility of reidentifying individuals in large National Physical Activity data sets from which protected health information has been removed with use of machine learning. JAMA Netw Open. 2018;1(8):e186040. doi: 10.1001/jamanetworkopen.2018.6040
  119. Sahni N, Stein G, Zemmel R, Cutler DM. The potential impact of artificial intelligence on healthcare spending. NBER Working Paper No. 30857, 2023. Available from: https://www.nber.org/system/files/working_papers/w30857/w30857.pdf Accessed: 22.11.2024

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. Most common models and algorithms of generative artificial intelligence in medicine (adapted from [17]). Note. AI, artificial intelligence; ML, machine learning; DAI, discriminative artificial intelligence; GenAI, generative artificial intelligence; LLM, large language models; CNN, convolutional neural network; RNN, recurrent neural network; LSTM, long short-term memory network; GAN, generative adversarial network; AAE, adversarial autoencoder; SAE, sparse autoencoder; VAE, variational autoencoder.

Download (314KB)
3. Fig. 2. Publication activity of authors from global macroregions on the topic of generative artificial intelligence in surgery, traumatology, and orthopedics.

Download (84KB)
4. Fig. 3. Applications of generative artificial intelligence technologies in surgery (adapted from [17]).

Download (394KB)

Copyright (c) 2025 Eco-Vector

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77-76249 от 19.07.2019.