Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 905 - Typing your question instead of googling it: how chatbots are changing radiology practice

March 5, 13:00 - 14:00 CET

6 min
Promoting Sustainable Breast Imaging and Interventional Practices with an AI-Based Chatbot
Gianmarco Della Pepa, Milan / Italy
Author Block: G. Della Pepa, G. Irmici, C. De Berardinis, E. D'Ascoli, L. Corradini, G. Rossini, C. Depretto, G. P. Scaperrotta; Milan/IT
Purpose: To develop and evaluate an educational chatbot powered by a low-footprint Large Language Model (LLM), aimed at increasing awareness and knowledge of sustainable clinical practices in breast imaging among radiology professionals.
Methods or Background: GreenBreastBot was developed as a Custom GPT using GPT-3.5, a pre-trained LLM, ensuring negligible energy consumption per interaction. The chatbot was populated with bilingual (Italian/English) structured content derived from the ESR Green Radiology position paper, WHO climate-health recommendations, and internal Breast Unit guidelines. The tool adapts its explanations based on user expertise and delivers microlearning units, interactive quizzes, flashcards, and clinical scenarios. A four-week pilot was conducted in a tertiary Breast Unit. Participants included radiologists, residents, and radiographers. Outcomes included usage frequency, satisfaction (5-point Likert scale), and self-reported awareness before and after interaction.
Results or Findings: Twenty-nine professionals participated: 12 consultants, 11 residents, and 6 radiographers. Users completed an average of 3.6 chatbot sessions per week. Overall satisfaction was high (mean 4.5/5); 91% found the chatbot useful or very useful. Post-intervention, 67% of participants reported improved awareness of sustainable imaging practices, with greatest gains in understanding paperless consent workflows, appropriateness in follow-up imaging, and environmentally conscious interventional preparation. No significant technical barriers were reported.
Conclusion: GreenBreastBot demonstrates that a pre-trained, low-energy LLM chatbot can effectively deliver eco-education in breast imaging. Its integration of institutional and international guidelines enables scalable, impactful, and environmentally coherent training for radiology teams.
Limitations: The main limitations are the monocentric design and the absence of objective performance measures.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Chatgpt for structured reporting in ct brain: resident experience
Karanvir Singh Chhabra, Jalandhar / India
Author Block: K. S. Chhabra1, D. B. Dahiphale2, S. S. Sarda2; 1Jalandhar/IN, 2Aurangabad/IN
Purpose: To assess the feasibility and utility of ChatGPT in generating structured CT brain reports, evaluate its impact on reporting time and accuracy, and document resident perspectives in a tertiary care setting.
Methods or Background: This pilot study included 20 radiology residents who reported 200 CT brain examinations between January and April 2025. Each case was reported twice: once using conventional free-text reporting and once with ChatGPT-assisted structured reporting. Residents used a standardised prompt library covering common CT brain findings such as haemorrhage, infarct, mass effect, hydrocephalus, and extra-axial collections. Metrics assessed included reporting time, completeness (based on a 10-point checklist), inter-observer consistency, and resident satisfaction. Accuracy was validated against consultant-reviewed reference reports.
Results or Findings: ChatGPT-assisted structured reporting reduced mean reporting time from 14.2 minutes to 9.1 minutes (36% reduction). Completeness scores improved significantly (mean 9.3/10 vs 7.8/10, p<0.01), with better coverage of critical elements such as haemorrhage location, mass effect, and ventricular status. Inter-observer agreement improved, particularly for standardised terminology. Accuracy compared with consultant reports was maintained, with no significant increase in errors. Resident feedback highlighted improved clarity and confidence, though some noted occasional generic or redundant phrasing requiring manual refinement.
Conclusion: ChatGPT shows promise as a practical tool for structured reporting in CT brain studies, enhancing efficiency, completeness, and inter-observer consistency without compromising accuracy. While human oversight remains essential, integration of AI-driven structured templates can support radiology training and streamline reporting in high-volume settings.
Limitations: Single-centre pilot design.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
A Fine-Tuned Large Language Model Chatbot for Multi-Scenario Radiology Cancer Care: Randomized Controlled Trial on Interaction Optimization, Emotional Support, and Provider Burnout Reduction
Liqiang Zhang, Chongqing / China
Author Block: L. Zhang; Chongqing/CN
Purpose: To develop and validate a scenario-specific fine-tuned LLM chatbot for optimizing clinical interactions between cancer patients and radiology healthcare providers (RHPs).
Methods or Background: A RCT across three hospitals collected 36,511 minutes of dialogue from 12 sites in three scenarios—Appointment Triage (AT), Pre-examination Preparation (PP), and Radiology Clinic Services (RCS)—transcribed and curated into 27,120 validated dialogues. REC was developed by fine-tuned DeepSeek R1 using 80% of dialogues and scenario-specific prompts. Two sub-trials evaluated REC: Sub-trial 1 included 1,424 patients in AT/PP; Sub-trial 2 included 638 in RCS. Both randomized patients 1:1 to RHP+REC or RHP. A total of 150 RHPs were similarly randomized. Primary outcomes were patient-rated dialogue quality (empathy, frustration, emotional regulation, factuality, integrity, and satisfaction); secondary outcomes included burnout and image quality.
Results or Findings: 1. Dialogue Quality:
AT/PP: RHP+REC significantly improved factuality (AT: 4.12 vs. 3.39; PP: 4.52 vs. 3.79; both P < 0.001), integrity, satisfaction, and reduced frustration (PP: 3.24 vs. 3.95, P = 0.002).
RCS: RHP+REC excelled in factuality (4.58 vs. 3.69, P < 0.001) and satisfaction (4.03 vs. 3.52, P = 0.003) but underperformed in empathy (3.88 vs. 4.42, P = 0.002).
2.Burnout:
RHP+REC reduced exhaustion (1.85 vs. 2.40, P < 0.01) and depersonalization (2.18 vs. 3.96, P = 0.003).
3.Image Quality:
REC improved CT (4.35 vs. 4.00, P < 0.01) and MRI (4.12 vs. 3.79, P = 0.02) quality.
Conclusion: REC optimized radiology workflows and reduced burnout.
Limitations: First, it requires provider validation, limiting scalability. Future versions should enhance autonomous validation while ensuring safety. Second, effectiveness relies on training data quality; continuous updates and broader datasets are needed for generalizability. Third, future work should improve clinical adaptability and multimodal integration (e.g., imaging, physiological data), with real-time feedback for continuous learning .
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The research protocol has been approved by the ethics review committees of all participating hospitals (H1, H2, H3), and the entire research process strictly follows the ethical principles of the Helsinki Declaration.
6 min
Achieving Truly Informed Consent? A Prospective Controlled Trial Using Retrieval-Augmented Generation Before CT Examinations
Felix Busch, Munich / Germany
Author Block: F. Busch, T. Lemke, S. Ziegelmayer, M. Graf, A. W. Marka, P. Prucker, M. R. Makowski, K. K. Bressem, L. C. Adams; Munich/DE
Purpose: This prospective comparative study aimed to investigate the feasibility, usability, and effectiveness of a Retrieval-Augmented Generation (RAG)-powered Patient Information Assistant (PIA) chatbot for pre-CT information counseling, compared to standard physician-led consultation and informed consent procedures.
Methods or Background: Eighty-six patients scheduled for CT imaging (November-December 2024) were randomly assigned to either the PIA group (n=43), receiving pre-CT information via a RAG-powered chatbot, or the control group (n=43), receiving standard doctor-led consultation. Patient satisfaction, information clarity, comprehension, and concerns were assessed using six ten-point Likert-scale questions. Consultation duration was recorded, and patients in the PIA group indicated their preferred mode of future counseling. Two radiologists independently evaluated each PIA session based on five criteria: overall quality, scientific and clinical evidence, clinical usefulness and relevance, consistency, and up-to-dateness.
Results or Findings: Both groups reported similarly high ratings for information clarity (PIA: 8.64 ± 1.69; control: 8.86 ± 1.28; p=0.82) and overall comprehension (PIA: 8.81 ± 1.40; control: 8.93 ± 1.61; p=0.35). Physician-led consultations more effectively alleviated patient concerns (8.30 ± 2.63 vs. 6.46 ± 3.29; p=0.003). Patients in the PIA group required significantly shorter subsequent consultation times (median: 120 s [IQR: 100-140] vs. 195 s [IQR: 170-220]; p=0.04). Radiologists rated PIA chats favorably across all evaluated categories.
Conclusion: A RAG-powered PIA chatbot can effectively deliver pre-CT information while reducing physician consultation time. Although patient satisfaction and comprehension were comparable to standard consultations, physician-led interactions remained superior in addressing patient concerns. These findings highlight the potential of AI-based chatbot solutions to streamline patient counseling for imaging procedures. At the same time, physician engagement remains crucial for addressing patient worries, suggesting a complementary role for both approaches in clinical practice.
Limitations: Single-center, sample size, self-reported outcomes.
Funding for this study: None.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Technical University of Munich (2024–469-S-KK)
6 min
The Resident, The Robot, and the Rules: Structured Prompts Drive Better AI Chest Readouts
Dhanush Jayanna, Bangalore / India
Author Block: D. Jayanna, S. Reddy Kankara, S. R. Kankara, A. Josephine, A. A. Monteiro, A. Mahesh Krishna, V. M. Tellis, R. S patel, S. D. Antoinette; Bangalore/IN
Purpose: To determine whether structured, role-specific prompting can reduce AI hallucinations and improve differential diagnosis (DDx) quality versus an unprompted baseline and a radiology resident reference across 120 thoracic imaging cases.
Methods or Background: Retrospective, paired, within-case evaluation of two advanced models (GPT-5 Thinking; Gemini 2.5 Pro) across six prompt tiers. Outcomes: DDx quality (grades 1–5) and hallucination burden (0–20 across four domains), scored by cardio-thoracic radiologist. Statistics: Friedman/Wilcoxon with multiplicity control; resident DDx served as fixed reference.
Results or Findings: Across 120 cases, DDx improved stepwise for both models (GPT-5: 2.78→4.49; Gemini: 2.98→4.45) while hallucinations declined (GPT-5: 6.95→0.54; Gemini: 8.02→1.04). Global prompt effects were decisive for DDx (χ²=201.6 and 176.6; p<10⁻³⁵) and hallucinations (χ²=279.0 and 261.3; p<10⁻⁵⁴). At the exemplar-guided, role-specific level (SR+ST+E), both AIs exceeded the resident’s mean DDx (4.49/4.45 vs 3.85; p<10⁻⁷), inverting their baseline inferiority. Head-to-head, GPT-5 showed fewer hallucinations than Gemini at no prompt (6.95 vs 8.02; p≤0.004) and a borderline advantage at tier 6 prompt (0.54 vs 1.04; p≈0.06); DDx was comparable between models at both extremes. Domain-level improvements (factual consistency, extraneous information, misinterpretation, clinical risk) paralleled total-score declines.
Conclusion: Prompt architecture is a powerful, controllable lever: structured, exemplar-guided, role-specific prompts markedly enhance DDx quality and suppress hallucinations to near-floor levels, enabling AI performance that matches or exceeds a resident reference with improved safety characteristics. These data support standardised prompting for safe, reliable deployment of general-purpose models in thoracic imaging workflows.
Limitations: Single-centre design with two human comparators may limit generalizability.
We used a novel four-domain, thresholded rubric—among the first studies to assess AI hallucinations in imaging diagnosis—which lacks external validation; despite rater calibration, some subjectivity and inter-rater variability may affect absolute scores and effect sizes.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
AI-Supported MR Safety Assessment of Implanted Devices: First Clinical Evaluation
Hanna Kreutzer, Aachen / Germany
Author Block: H. Kreutzer, D. Rashid, D. Truhn, S. Nebelung; Aachen/DE
Purpose: MR safety checks for patients with implanted devices are time-consuming and error-prone. Clinicians must identify the exact device model, retrieve the manufacturer’s handbook, and extract applicable scanning conditions. We developed an AI-agent that streamlines device-specific MR eligibility assessment using manufacturer documentation and scientific literature.
Methods or Background: The agent is built in LangGraph with a router node classifying user queries. Device-specific queries are directed to a retrieval-augmented generation (RAG) pipeline that utilizes manufacturer handbooks. General MR safety queries are handled by a separate RAG pipeline that utilizes peer-reviewed scientific literature. A central GPT-4.1 node composes the final output.
A web-based interface (chatbot-like) allows free-text queries or image uploads of implant ID-cards, which are analysed with GPT-4.1 Vision. The interface displays both the reasoning steps and the retrieved handbook/literature pages for transparency.
Evaluation was performed using consecutive patients with cardiac devices from our hospital. An MR-physicist documented the final safety decision (scan eligibility and protocol parameters), which served as the reference standard.
Results or Findings: The agent’s recommendation was correct in 15/19 cases. In the remaining four cases, the system flagged missing documentation, thereby avoiding unsupported recommendations. Importantly, no incorrect recommendations were made. Correct guideline source pages were displayed in 13 of the 15 correct cases.
Conclusion: An AI-agent grounded in manufacturer guidance can reliably answer MR safety questions. Early testing demonstrates promising accuracy and interpretability, with transparent display of reasoning and sources. If scaled beyond cardiac devices and expanded into comprehensive device databases, such agents have the potential to fundamentally transform MR safety practice by accelerating workflows, reducing errors, and setting new standards for patient safety in radiology.
Limitations: Some device manuals were unavailable. Evaluation was restricted to cardiac devices. Use of GPT-4.1 requires anonymised data.
Funding for this study: This research is supported by the Deutsche Forschungsgemeinschaft - DFG (701010997, 517243167, 515639690) , the German Federal Ministry of Research, Technology and Space (Transform Liver - 031L0312C, DECIPHER-M, 01KD2420B) and the European Union Research and Innovation Programme (ODELIA - GA 101057091, SAGMA – GA 101222556).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Evaluating The Role Of ChatGPT In Orthopaedic Virtual Fracture Clinics: Potential For Clinical Management And Decision Support
Arnav Gupta, Birmingham / United Kingdom
Author Block: A. Gupta, R. Botchu; Birmingham/UK
Purpose: The Orthopaedic Virtual Fracture Clinic (VFC) is an innovative healthcare model that leverages digital technology to manage and triage patients with musculoskeletal conditions remotely. Advanced AI tools like ChatGPT have shown promise in assisting with patient triage by providing initial assessments based on patient-reported symptoms and history and offering detailed explanations of treatment protocols. Integrating ChatGPT into orthopaedic VFCs could significantly save time in patient triaging, thereby enhancing the VFC process. However, no study has yet investigated whether ChatGPT can aid clinicians in VFCs by generating adequate clinical management plans. This study explores the potential of ChatGPT in orthopaedic VFCs and examines whether it can replace or support clinicians during VFCs.
Methods or Background: We conducted a retrospective study reviewing 50 consecutive patient records referred to our virtual fracture clinic (VFC). We compared outcome measures between clinicians and ChatGPT 4, analysing the differences in decision-making.
Results or Findings: Our findings reveal distinct differences in the recommendations provided by ChatGPT 4 compared to human clinicians across various outcome measures, highlighting both the strengths and limitations of AI in this domain.
Conclusion: Significant differences were observed between ChatGPT's recommendations and those of human clinicians, with the AI tending toward more conservative approaches. While these tendencies could enhance patient care, they may also lead to unnecessary resource utilisation. Further refinement and calibration of ChatGPT's algorithms are necessary to align its recommendations with clinical best practices.
Limitations: Limitations include the small sample size of 50 cases and the retrospective design, which may affect generalizability. Additionally, ChatGPT’s conservative recommendations highlight the need for algorithm refinement, and the study does not assess real-time clinical integration or patient outcomes following AI-supported decisions
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Na
6 min
Energy Usage of Large Language Models and Segmentation Models in Radiology
Martin Segeroth, Basel / Switzerland
Author Block: M. Segeroth, S. Yang, J. Wasserthal, J. Cyriac, T. Heye, E. M. Merkle, M. Bach, J. Vosshenrich; Basel/CH
Purpose: Neural networks, in particular large language models (LLMs), are increasingly valuable tools that support human tasks rather than simply automating them. However, their use requires substantial amounts of energy. In clinical practice, justified privacy concerns favor the evaluation of open-source models, which allows for assessing their energy consumption.
Methods or Background: Within our institutional healthcare network, we deployed privateGPT and Ollama as the primary platforms for LLM utilization, and Nora for image analysis. The models were hosted on a server equipped with eight NVIDIA A100 GPUs (80 GB each). For LLM experiments, we tested Llama3-70B, and for medical image segmentation, we used TotalSegmentator. Task scheduling was managed with Slurm 23.11.4, while energy consumption was monitored using nvidia-smi 550.163.01 and turbostat 2023.11.07. Additional overall server-level measurements were performed.
Results or Findings: The server’s eight GPUs allow a maximum power of 400 W each, yet during our tests total peak power consumption reached 4235 W, with more than 1000 W attributable to non-GPU components. Idle consumption was 63 W per GPU and 1150 W for the full server. A single LLM request consumed 5.94 Wh (95% CI: 5.87–5.98 Wh), with GPU utilization at 86.39% (CI: 86.39–86.39%). TotalSegmentator training for MRI segmentations required 8389.14 Wh (CI: 8193.84–8730.37 Wh), with GPU utilization at 78.93% (CI: 78.93–78.93%). Inference with TotalSegmentator consumed 0.96 Wh (CI: 0.96–0.97 Wh) per case for tissue types, and complete MRI segmentation required 1.47 Wh (CI: 1.47–1.48 Wh).
Conclusion: Neural networks in clinical deployment consume a noticeable amount of energy, with individual tasks requiring 1–6 Wh, several times more than a typical Google search (~0.2 Wh). Nonetheless, their ability to augment clinical performance and support decision-making can justify the additional energy expenditure.
Limitations: Additional models and hardware are under evaluation.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: