Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 2205 - AI beyond diagnosis: education, regulation and implementation

March 8, 08:00 - 09:00 CET

6 min

When to validate what: a framework for local performance assessment of AI in radiology

Stephan Romeijn, Utrecht / Netherlands

Author Block: K. G. Van Leeuwen, M. Horton, S. Krijgsman, S. Romeijn; Utrecht/NL
Purpose: To present a structured framework that guides decisions on local performance assessment steps needed for safe and efficient AI adoption.
Methods or Background: Local quality assurance is crucial for the responsible implementation of commercially available AI solutions in radiology. A common belief is that every AI tool must always be extensively validated on local data before use. However, this is often impractical, particularly in non-academic hospitals, where resources are limited. To enable responsible yet feasible adoption, we present a framework that aligns the extent of performance validation with the available clinical evidence and the clinical risk of the application.
Results or Findings: The framework starts with an analysis of existing scientific evidence to assess whether sufficient clinical evidence is available on relevant population data. The next step is to consider the clinical risk (low or high). Based on these two factors, different validation types are recommended:
• Low-risk with sufficient evidence: acceptance test and post-deployment monitoring may suffice. E.g. bone age prediction.
• High-risk with sufficient evidence: acceptance test, pilot (shadow-mode or restricted implementation), and post-deployment monitoring are indicated. E.g. breast cancer screening AI.
• Low-risk with limited evidence: retrospective analysis, acceptance test, and post-deployment monitoring are advised. E.g. vertebral fracture detection for opportunistic osteoporosis screening.
• High-risk with limited evidence: retrospective analysis, acceptance test, pilot (shadow-mode or restricted implementation), and post-deployment monitoring are recommended. E.g. stand-alone stroke detection AI.
Conclusion: By tailoring local performance assessment to risk and evidence, this framework balances patient safety with feasibility, supporting responsible AI implementation in both academic and non-academic hospitals.
Limitations: The scope of this framework is limited to commercially available AI systems used in accordance with their intended use and does not address IT or data flow quality assurance.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Clinician-Centric Explainable AI: Building Trust in Radiology Diagnosis

Tician Schnitzler, Aarau / Switzerland

Author Block: T. Schnitzler¹, H. Zaytoun¹, A. P. Gehret¹, O. E. Oğretmen², M. Yavuz²; ¹Aarau/CH, ²Istanbul/TR
Purpose: Artificial intelligence (AI) in radiology continues to face adoption barriers due to opaque decision-making processes that limit clinical trust. Traditional explainable AI (XAI) tools—such as saliency maps or feature importance rankings—rarely align with radiologists’ diagnostic reasoning. We propose a Clinician-Centric Explainable AI (CC-XAI) framework to embed clinically meaningful, anatomy- and workflow-rooted explanations directly into radiological decision support.
Methods or Background: A conceptual and applied analysis of CC-XAI was performed, contrasting its interpretability with general-purpose XAI. We developed a five-category taxonomy comprising: (1) Visual (anatomy-aware saliency maps), (2) Textual (clinical language generation), (3) Example-Based (retrieval of annotated similar cases), (4) Quantitative (comparative measurement over time), and (5) Rule-Based (integration of clinical decision logic). A modular CC-XAI system was illustrated using a representative lung nodule case, demonstrating integration of lesion localization, volumetric growth analysis, similar case retrieval, and risk-factor-based reasoning into one cohesive explanation output.
Results or Findings: The proposed CC-XAI framework translates AI predictions into structured, interpretable outputs aligned with clinical workflows and cognitive heuristics. In our example, a spiculated nodule with growth from 8 mm to 15 mm over one year (estimated doubling time of 1119 days) was explained via anatomy-aware localization, growth kinetics, retrieval of two malignant-confirmed prior cases, and rule-based consideration of smoking, cancer history, and pulmonary fibrosis. Each module contributes to a cohesive final interpretation that improves explainability and trust calibration.
Conclusion: Clinician-centric XAI addresses a fundamental barrier to AI adoption in radiology by offering domain-specific, multi-dimensional explanations that mirror radiologists' decision-making processes. Embedding these modules into radiology workflows can facilitate clinically robust, interpretable, and trustworthy AI integration.Its modular design supports integration into clinical workflows for safer, more interpretable, and diagnostically aligned AI usage—especially in high-stakes areas such as lung cancer evaluation.
Limitations: No limitations.
Funding for this study: No Funding.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Environmental Impact Reduction through Deep Learning Reconstruction in Clinical CT Imaging: A Hospital-Based Assessment

Paolo Niccolò Franco, Monza / Italy

Author Block: P. N. Franco, C. Maino, C. R. G. L. O. M. Talei Franzesi, D. G. Gandola, I. Bianco, R. Corso, D. Ippolito; Monza/IT
Purpose: The healthcare sector accounts for approximately 4.4% of global greenhouse gas (GHG) emissions, with diagnostic imaging being one of the most energy-intensive activities. Computed Tomography (CT) is a major contributor due to high electricity demand and associated emissions. This study aims to evaluate the environmental impact of implementing deep learning-based image reconstruction (DLIR) in CT imaging in a large hospital, focusing on electricity, CO₂ emissions, and iodinated contrast media (ICM).
Methods or Background: Over the course of one year, data were collected from four CT scanners: two newly installed scanners utilizing DLIR technology and two older scanners operating with hybrid and model-based iterative reconstruction algorithms. Energy consumption, CO₂ emissions, and ICM usage were monitored and compared between the two groups. DLIR protocols were introduced with reduced tube voltage (from 120 kV to 80–100 kV) and optimized ICM dose. Energy savings were estimated from the quadratic relationship between kV and power. Emission reduction was quantified in CO₂-equivalent (CO₂e) tons. Water savings were derived from ICM production impact.
Results or Findings: Approximately 28,000 CT exams were performed (54% on iterative scanners, 40% on DLIR). Standard CT scanners consumed ~41,000 kWh/year, equating to ~10.25 tons CO₂/year per device. DLIR implementation allowed for a 30–55% energy reduction depending on scan type. Total annual savings per CT unit were up to 22,550 kWh and 5.6 tons CO₂e. ICM was reduced by 30%, saving ~2,100 liters/year per DLIR scanner and preventing ~81 tons of CO₂e.
Conclusion: Deep learning reconstruction in CT imaging can significantly reduce energy use, CO₂ emissions, and contrast media consumption without compromising diagnostic quality. These results support its implementation as a sustainable innovation in clinical radiology.
Limitations: This study was conducted in a single center, limiting generalizability.
Funding for this study: None.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Algorithmic Fairness Unfolded: Collaborative Ethnography within a Medical Imaging AI Lab for Lung Cancer Screening

Michel Vitale, Nijmegen / Netherlands

Author Block: M. Vitale, M. Vegter, C. Jacobs, M. Boenink; Nijmegen/NL
Purpose: This study reports on collaborative efforts within an AI lab developing solutions for lung cancer screening (LCS) to investigate and negotiate how fairness is framed and operationalized in practice, and to develop actionable recommendations for better algorithmic fairness practices in medical imaging AI. It contributes to calls for more context-sensitive, morally robust, and interdisciplinary approaches to algorithmic fairness.
Methods or Background: We draw on a two-year collaboration between engineers and an ethicist in a UMC AI lab, considering fieldnotes and observations from meetings, ethics roundtables, and a six-month project assessing the fairness of AI-based risk models for LCS (Sybil, Venkadesh21, PanCan2b). Leveraging “ethics parallel research” alongside elements of the JustEFAB framework, we reconstruct and critically examine practices, assumptions, and constraints shaping engineers’ approach. We trace how fairness was framed, bias assessed, and explore how fairness practices can be improved. Findings are validated in a follow-up roundtable, informing an agenda for more comprehensive, situated, and auditable fairness practices.
Results or Findings: We observe tensions between comprehensive theoretical frameworks of algorithmic fairness and narrow approaches prevalent in engineering practices, where fairness is often overlooked or reduced to technical optimization. Practical constraints (data availability, timelines, unclear responsibilities, limited resources and interdisciplinary opportunities) foster statistical approaches with limited connection to clinical needs or broader contextual ethical considerations. Bridging this gap requires reflexivity tools, transparent decision-making, interdisciplinary research, and recognition of fairness as a multifaceted ethical issue.
Conclusion: Current fairness assessments insufficiently engage with AI socio-technical and ethical dimensions. Interdisciplinary collaboration can bridge theory and practice, helping developers move beyond technical metrics. Ethics parallel research made these dimensions visible, contextual, and clinically relevant, while highlighting barriers and enablers of better fairness practices.
Limitations: Qualitative, single-center study focused on AI for medical imaging only.
Funding for this study: This research is funded by a public-private project with funding from the Dutch Science Foundation, the Dutch Ministry of Economic Affairs, and MeVis Medical Solutions (Bremen, Germany).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

The Influence of Artificial Intelligence on Radiologists’ Diagnostic Performance in Fatigue

Mahta Khoobi, Aachen / Germany

Author Block: M. Khoobi, D. Truhn, C. K. Kuhl, S. Nebelung, R. Siepmann; Aachen/DE
Purpose: Night-shift duty can cause fatigue in radiologists, potentially impairing diagnostic performance and compromising patient safety. This study aims to evaluate the impact of night-shift-induced fatigue on radiologists’ diagnostic accuracy and whether artificial intelligence (AI) can mitigate the expected decline in performance.
Methods or Background: This prospective intra-individual reader study (July 2024–March 2025) involved 10 radiology residents (mean clinical experience: 35 months; three women) interpreting three sets of 33 chest radiographs before and after a first-call on-site night shift. Sessions included two pre-shift readings (without AI) and two post-shift readings (with and without AI). Eye tracking captured fixation duration and saccade count, and fatigue was classified as a ≥5% reduction in the slope of the saccadic main sequence between pre- and post-shift sessions. Diagnostic accuracy (Cohen’s κ vs expert majority vote), efficiency (mean reporting time per image), eye-tracking metrics, and surveys were analyzed using generalized linear mixed models with Tukey-adjusted post-hoc comparisons.
Results or Findings: Six of 10 radiologists were classified as fatigued. In fatigued radiologists, diagnostic accuracy declined significantly post-shift (from κ=0.71±0.05 to κ=0.61±0.10; P<.001), accompanied by reductions in fixation duration (P=.010) and saccade count (P=.015). Non-fatigued radiologists showed a smaller decline in accuracy (from κ=0.72±0.03 to κ=0.65±0.03, P=.037). Reporting times remained stable pre- and post-shift (P≥.463). AI assistance modestly increased accuracy in both groups (Δκ=0.03–0.04; P≥.525) and reduced reporting times in fatigued (P=.122) and non-fatigued radiologists (P=.031). User surveys indicated increased mental exhaustion in fatigued radiologists.
Conclusion: Night-shift-induced fatigue impaired diagnostic accuracy without slowing reporting speed, thereby raising concerns about patient safety. AI modestly improved performance but failed to restore pre-shift performance. Institutional fatigue-management strategies are essential, with AI serving as a supportive adjunct.
Limitations: A small sample size and restriction to chest radiographs limited the study.
Funding for this study: The Deutsche Forschungsgemeinschaft supports this research (DFG 701010997, 517243167, 515639690), the German Federal Ministry of Research, Technology, and Space (Transform Liver - 031L0312C, DECIPHER-M, 01KD2420B), and the European Union Research and Innovation Programme (ODELIA - GA 101057091).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethical approval was obtained (Medical Faculty, RWTH Aachen University, Germany, reference number 24/376), and informed consent was waived.

6 min

RadGame: an AI-powered platform for radiology education

Mohammed Baharoon, Jamaica Plain / United States

Author Block: M. Baharoon¹, J. Jun¹, S. Raissi¹, T. Heintz², M. Alabbad³, A. Alburkani⁴, M. Mohammed⁴, H. R. Alomaish⁴, P. Rajpurkar¹; ¹Boston, MA/US, ²Maastricht/NL, ³Hofuf/SA, ⁴Riyadh/SA
Purpose: The purpose of this study is to improve radiology education by teaching trainees two essential skills: localizing imaging findings and generating structured clinical reports. Traditional methods often lack interactive feedback and rely on limited datasets, restricting learning opportunities. RadGame addresses these gaps by combining gamification with large-scale datasets and AI-driven feedback to provide scalable, personalized, and engaging training.
Methods or Background: We conducted a prospective, multi-institutional user study with 18 medical students who completed both RadGame modules (Localize and Report) under gamified and traditional learning conditions. Participants were evaluated with pre- and post-tests, and performance was measured using radiologist-verified ground truths, the CRIMSON reporting metric, and efficiency metrics such as time per case.
Results or Findings: Students using RadGame demonstrated significantly greater gains than those in traditional learning, with a 68% improvement in localization accuracy versus 17% and a 31% improvement in report-writing accuracy versus 4%. Additionally, the gamified group showed increased efficiency, completing cases faster over time while maintaining higher diagnostic accuracy.
Conclusion: RadGame demonstrates that AI-powered gamification can significantly enhance radiology training by integrating structured, real-time feedback into localization and report-writing tasks. Compared to traditional passive methods, the platform produced larger improvements in diagnostic accuracy and efficiency among medical trainees. Beyond its educational impact, RadGame also serves as a human-in-the-loop testbed for refining AI evaluation metrics such as CRIMSON, making them more clinically meaningful. These findings highlight the potential for repurposing medical AI resources into scalable, learner-centered educational tools.
Limitations: The limitations of the study are due to a relatively small sample size of 18 students, which reduced the statistical power for some comparisons. Additionally, the gamified module required more interaction time than traditional methods, reflecting trade-offs between active engagement and efficiency.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was reviewed by the Harvard Faculty of Medicine Institutional Review Board (Protocol #IRB25-0694) and determined to be exempt under 45 CFR 46.104(d)(2)(3).

6 min

Economic Evaluations of AI Applications in Radiology: A Systematic Review

Federica Zanca, Leuven / Belgium

Author Block: F. Zanca¹, L. Gregory², F. Lock², H. Harvey²; ¹Leuven/BE, ²Haywards Heath/UK
Purpose: Artificial intelligence (AI) in radiology offers promising applications to enhance clinical and patient outcomes. However, adoption has been slow, partly due to a lack of robust health economic evidence increasingly required for technology integration. While established tools focus on clinical evidence, they often overlook economic impact. This review synthesises literature on economic evaluations of AI in radiology.
Methods or Background: A systematic review of health economic evaluations of radiology AI was conducted and reported following PRISMA guidelines and economic review recommendations. MEDLINE and Cochrane Central were searched using a targeted strategy. Eligible studies were screened, with conflicts resolved by consensus review. Data were extracted into economic, patient, and clinical outcome domains, and reporting quality assessed using CHEERS-AI for evaluations using decision-analytic models.
Results or Findings: Thirty-one studies were included. Reported outcomes varied widely. Most focused on direct costs, cost-effectiveness, and diagnostic accuracy with limited attention to productivity or workflow metrics. Quality-adjusted life years (QALYs) were the predominant outcome, although alternative measures such as cost per patient screened or per correct diagnosis were also used. Half of the studies applied decision-analytic modelling, predominantly in opportunistic imaging contexts, and most originated from the US and UK. Evidence from European settings was scarce.
Conclusion: There is a significant gap between the rapid proliferation of radiology AI tools and the availability of health economic evidence to support their adoption. Economic outcomes and metrics have been largely based on the pharmaceutical best practices. Future studies should broaden outcomes assessed to include productivity, workflow efficiency, and access to care. Cross-disciplinary collaboration and harmonised international guidance are essential to ensure that radiology AI adopted responsibly.
Limitations: Many included studies predated CHEERS-AI. We excluded cardiology and invasive imaging studies. All included studies focused on deep learning.
Funding for this study: This work was supported by the European Innovation Council and SMEs Executive Agency (EISMEA).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

AI Act and AI-based radiology software: Implications for clinical use

Joel Kohler, Heidelberg / Germany

Author Block: J. Kohler¹, J. Jesser¹, A. Seehafer², B. Kohler³, S. Heiland¹, M. Bendszus¹, J. Kernbach¹, M. Schell¹; ¹Heidelberg/DE, ²Berlin/DE, ³Zurich/CH
Purpose: Hospitals and other healthcare providers increasingly implement AI-based radiology software in their clinical practice. We investigated how the European AI Act (AIA) might affect this usage from 2027 on.
Methods or Background: We applied qualitative and quantitative legal interpretation methods to simulate how the AIA could apply to hospitals. To analyze market effects, we evaluated the landscape of commercial clinical radiology AI applications. We used data provided in the EU Impact Assessment to estimate costs of compliance.
Results or Findings: The AIA intensely regulates 'high-risk' AI systems (assumed 5-15 % of all AI applications). 98.6 % of current products and most future clinical radiology AI-based software will represent such high-risk AI systems. If a hospital uses an AI software as provided by a third-party company, the hospital will act as a ‘deployer' (low level of legal duties and moderate costs associated) and the company will be a 'provider' (high level of legal duties). However, not only in case of placing its own AI system on the market but also in certain other scenarios, a hospital might itself become a 'provider' - in some cases unexpectedly and unintentionally. This provider role may lead to roughly 30000 EUR compliance costs per application and up to 330000 EUR one-off and 71000 EUR annual costs on top if a quality management system has to be set up.
Conclusion: Clinical stakeholders should prepare for potentially severe legal and economic implications of AI usage. The AIA might hamper broad innovation adaption and leveraging clinical benefits for providers of AI systems in radiology. Public authorities should consider mitigating these effects.
Limitations: The limitations of the study include legal and economic uncertainties.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

PAIR: A Framework for Standardized Reporting of Real-World AI Implementation in Hospitals

Ramprabananth Sivanandan, Asker / Norway

Author Block: R. Sivanandan¹, W. Grootjans²; ¹Asker/NO, ²Leiden/NL
Purpose: Artificial intelligence (AI) is moving rapidly from development to deployment in radiology. While reporting standards such as CLAIM, CONSORT-AI, DECIDE-AI, STARD-AI, and TRIPOD-AI address validation and trials, they overlook the complex realities of implementation in radiology workflows. This lack of structure hinders comparison, reproducibility, across institutions and potentially limits added value of AI solutions.
Methods or Background: We developed the Protocol for AI Implementation Reporting (PAIR) to provide a standardized, practice-oriented framework for documenting and guiding AI implementation in hospitals. PAIR integrates and extends existing reporting guidelines, and it is organized around four domains. The first domain, problem and purpose, describes the clinical need, clarifies the value proposition, and supports application selection. The second is the AI system, which details the technical specifications, regulatory status, and intended use. The third domain addresses implementation, including governance, workflow integration, interoperability, user training, and change management. The fourth domain, results and sustainability, encompasses safety monitoring, drift and bias control, equity, cost-effectiveness, adoption, and retirement criteria.
Results or Findings: PAIR includes a reporting checklist, manuscript template, and workflow diagram to support transparent documentation and knowledge sharing. Beyond reporting, it also functions as a practical guide for hospitals to structure AI adoption and assess whether an application adds measurable value to radiological practice.
Conclusion: PAIR fills an existing gap between reporting standards and enables transparent and reproducible reporting while guiding safe, equitable, and value-driven AI implementation in clinical practice. Its adoption will accelerate the adoption of trustworthy AI in radiology and beyond.
Limitations: At present, PAIR remains conceptual and awaits empirical validation
Funding for this study: No funding
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: