Research Presentation Session: Imaging Informatics and Artificial Intelligence

RPS 205 - Meta-level topics in AI: cost-effectiveness, non-interpretive use-cases and evidence

February 26, 10:00 - 11:00 CET

  • ACV - Research Stage 2
  • ECR 2025
  • 8 Lectures
  • 60 Minutes
  • 8 Speakers
  • 1 Comment

Description

7 min
Early health technology assessment for an artificial intelligence tool to detect incidental pulmonary embolisms on computed tomography
Erik Hermanus Marcellinus Kemper, Rotterdam / Netherlands
Author Block: E. H. M. Kemper1, K. Redekop1, F. Vos2, M. Ijzerman1, M. P. A. Starmans1, J. J. Visser1; 1Rotterdam/NL, 2Delft/NL
Purpose: Incidental pulmonary embolisms (IPE) on computed tomography (CT) are missed in up to 70% of cases. While artificial intelligence (AI) tools for IPE detection exist, an evaluation on if and how these tools can provide actual value, e.g., fit patients and end-users needs (i.e., radiologists), have never been performed. The aim of this early health technology assessment (eHTA) is to determine the requirements for a value-based AI tool for IPE detection on CT.
Methods or Background: A comprehensive eHTA process for radiology-AI was proposed and conducted for IPE. A literature search, structured interviews, focus group, and evaluation meetings were performed with the identified stakeholders to define criteria and scenarios for a multiple criteria decision analysis (MCDA). A representative survey was developed and circulated to weigh the importance of the criteria and assess performances of four possible AI designs. MCDA analysis on the survey help quantify the value requirements.
Results or Findings: Consultations with radiologists, treating physicians, patients, radiology technologists, AI specialists, legal experts, and ethicists resulted in 14 sub-criteria and five main criteria; patient impact, model performance, physician support, environmental impact, and costs. Preliminary outcomes indicate that a short follow-up time for diagnosing IPE is more important than a high sensitivity for IPE detection.
Conclusion: A value-based AI tool for IPE detection should be focused on triage to reduce the impact of the diagnosis of IPE on the patient, mainly because delay of diagnosis can result in progression of the IPE and preventable stress for the patient, while an improved detection rate is considered to result in significant overtreatment.
Limitations: The scope of this analysis has been within Europe. Outcomes might not be applicable elsewhere.
Funding for this study: E.H.M.K., K.R., M.P.A.S., F.V, and J.J.V. acknowledge funding by LSH-TKI (Health~Holland Dutch Top Sector Life Sciences and Health) 23024
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No applicable
7 min
Potential costs and benefits of AI for fracture detection in cervical spine CT scans at hospital level
Gaby Van Den Wittenboer, Zwolle / Netherlands
Author Block: G. Van Den Wittenboer1, I. M. Nijholt1, M. Maas2, M. F. Boomsma1; 1Zwolle/NL, 2Amsterdam/NL
Purpose: Aim of this study was to assess healthcare costs at the hospital level for patients screened for cervical spine (CS) fractures using CT, and to estimate costs and benefits of incorporating artificial intelligence (AI) to detect CS fractures in clinical practice.
Methods or Background: Diagnostic accuracy of on-duty radiologists and AI in detecting CS fractures on CT scans from a retrospective database (n=2321, ≥18 years, 2007-2014) was compared with a reference standard. Healthcare costs for patients were inventoried up to 7 months after their emergency department visit. Total and average costs per patient based on the radiologist diagnosis were calculated for four categories: true positive, true negative, false positive, and false negative. Finally, a scenario-analysis was conducted to estimate the diagnostic accuracy of radiologists combined with AI, and the corresponding total healthcare costs per diagnostic category.
Results or Findings: Radiologists identified 193 out of 219 scans with fractures and 2085 out of 2102 scans without fractures, whereas AI identified 177 out of 219 fractures and 2065 out of 2102 scans without fractures. AI identified 23 fractures missed by the radiologists and correctly classified 16 non-fracture scans that had been misclassified as fractures by the radiologists. This resulted in a potential sensitivity of 216/219 (98%) and specificity of 2101/2102 (>99%) for the combined radiologist-AI approach. On average, €5,978 less was spent per missed fracture. The total cost for the AI-assisted scenario was €61,132 (0.3%) higher than for radiologists alone.
Conclusion: In this scenario-analysis, the use of AI appears to increase hospital costs by 0.3% due to more accurate diagnoses. A next step could be to complement these results with non- hospital costs and quality-adjusted life years to further investigate the cost-effectiveness of this AI.
Limitations: No limitations were identified.
Funding for this study: The radiology department of the Isala received a grant from AIDOC Medical to have a third party (THINC, Utrecht, the Netherlands) that is specialized in early health technology assessments, perform the analyses for this study. AIDOC medical had no role in the data analyses itself. Neither AIDOC Medical nor THINC had a role in data collection or drafting of the abstract.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: The study uses retrospective data.
7 min
Cost-effectiveness of AI-assisted digital mammography – results from a Swedish model-based analysis
Pantelis Gialias, Linkoping / Sweden
Author Block: P. Gialias, J. Lyth, M. Kristoffersen Wiberg, T. Bjerner, M. Husberg, L. Bernfort, H. Gustafsson, L-Å. Levin; Linköping/SE
Purpose: To evaluate the cost-effectiveness of AI-assisted biennial digital mammography (AI-DM) in comparison to conventional digital mammography (cDM) with double reading of screening mammograms (screening interval ages 40-74).
Methods or Background: We used a Markov decision analytic model with a life-time horizon. The analysis was conducted from a healthcare perspective. Model parameters were based on Swedish registry data and published randomized AI-DM studies. The model estimates the costs and quality-adjusted life-years (QALYs) related to mammography and breast cancer. Mammography-related costs were collected from the university hospital in Linköping. Stage-specific cancer cost,QALY-weights were obtained from the literature. Scenario analyses were performed with different screening strategies.
Results or Findings: Per 1000 individuals AI-DM gained 10.8 QALYs compared to cDM. The costs per 1000 individuals were USD 3,752,278 and USD 3,816,443 for AI-DM and cDM, respectively. AI-DM resulted in a cost saving of USD 64 165 which makes it a dominant strategy. The isolated screening costs were slightly higher in the used AI-DM setting, USD 597, but this was offset by reduced lifetime costs of cancer treatment. A screening strategy with AI plus one radiologist for all examinations saves USD 9128 screening costs compared to cDM, however the QALYs gained were decreased to 8.8.
Conclusion: AI-DM is cost saving in our setting and generates more quality-adjusted life-years. One of the add-on benefits is the possibility to free radiological time to other clinical work. These benefits could be further improved by changing the AI-DM triaging strategy.
Limitations: We based AI parameters in the model mainly on two Swedish randomized trials and cancer data from the population-based cancer registry from Sweden. However, cost data are highly dependent on the Swedish health care system and the generalizability to other health care systems might be limited.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not ethics committee approval was need for this study
7 min
AI Tools to Reduce Claims and Compensation Payments of Missed Fractures on Radiographs: A Potential Game Changer?
Nor-Eddine Regnard / France
Author Block: M. Tordjman1, L. Gracia1, E. Guillo1, R. Amar1, J. Ventre1, N-E. Regnard2, R. Y. Carlier1, J-L. Marmorat1, J-D. Laredo1; 1Paris/FR, 2Lieusaint/FR
Purpose: To evaluate the potential of BoneView, an AI tool for fracture detection on radiographs, in claims files of missed fractures which led to financial compensation.
Methods or Background: This retrospective study included all the files of patients who submitted a claim and had financial compensation for missed fractures on radiographs from January 2013 to December 2019 in the 38 university hospitals of the Greater Paris area Hospitals (APHP, France). Of the 29 patients who claimed files, 26 were finally included (3 were not available in the system). For each patient with a claim, 5 patients with radiographs from the same anatomical areas (with or without fracture) were included from consecutive patients who had radiographs at a university hospital in 2022. Two readers (one fellow in musculoskeletal radiology and one expert radiologist in musculoskeletal imaging with more than 20 years of experience) read the radiographs, blinded from which patients had missed fractures.
Results or Findings: 156 patients were included (26 patients with missed fractures and 130 « control » patients). The AI software was able to detect 80.7% of fractures (21/26) for the patients who filed claims for missed fractures. The sensitivity of readers was also improved with AI for these patients: the junior reader had a sensitivity of 61.5% without AI and 69.2% with AI and the expert reader had a sensitivity of 73.1% without AI and 84.6% with AI. The total of potentially avoided financial compensation would have been 265.314 euros.
Conclusion: The sensitivity of the two readers is improved with AI in a cohort of patients with missed fractures who submitted claims and had financial compensations. AI was able to detect most of these fractures.
Limitations: A limitation was the small number of claims files.
Funding for this study: There was no funding for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable
7 min
Overlooked and underpowered: a meta-research study addressing sample size in radiomics research
Jingyu Zhong, Shanghai / China
Author Block: J. Zhong1, J. Lu2, Y. Xing1, Y. Hu1, D. Ding1, X. Liu1, S. Dai1, H. Zhang1, W. Yao1; 1Shanghai/CN, 2Stanford, CA/US
Purpose: To investigate how studies determine the sample size when developing radiomics models, and whether it is sufficient.
Methods or Background: We identified radiomics studies published from January to December 2023 on seven leading peer-reviewed radiological journals owned by European Society of Radiology and Radiological Society of North America. We reviewed the sample size justification methods, and actual sample size used. We calculated the minimum sample size according to 3 criteria proposed by Riley et al, and compared the estimated and the actual sample size used. We investigated which characteristics factors were associated with the sufficient sample size.
Results or Findings: We included 116 studies. 11/116 studies justified the sample size, in which 6/11 performed a priori sample size calculation. The mean ± standard deviation (SD), median (first and third quartile, Q1, Q3) of total sample size of models are 451 ± 871, 223 (130, 463), and those of sample size for training are 292 ± 676, 150 (90, 288). The mean ± SD, median (Q1, Q3) of difference between the total sample size and minimum sample size according to Riley et al criterion 3 are 120 ± 888, -100 (-216, 183), and those of difference between the sample size for training and minimum sample size according to Riley et al all 3 criteria are -386 ± 1264, -268 (-427, -157). The model testing method and specialty of topic were associated with sufficient sample size.
Conclusion: Radiomics models are often designed without sample size justification, as a consequence many models are too small to avoid overfitting, noise, and outliers. It should be encouraged to justify, perform and report sample size calculations when developing radiomics models.
Limitations: The limitation of the study is limited number of leading peer-reviewed radiological journals.
Funding for this study: Funding was provided by National Natural Science Foundation of China (82302183, 82471935, 82271934), Yangfan Project of Science and Technology Commission of Shanghai Municipality (22YF1442400), Research Found of Health Commission of Changing District, Shanghai Municipality (2023QN01), Laboratory Open Fund of Key Technology and Materials in Minimally Invasive Spine Surgery (2024JZWC-ZDA03, 2024JZWC-YBA07), and Research Fund of Tongren Hospital, Shanghai Jiao Tong University School of Medicine (TRKYRC-XX202204, TRYJ2021JC06, TRYXJH18, TRYXJH28).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: The study is a meta-research study with a protocol available on OSF (https://osf.io/pbukc/), and no human participants or animals were included in the study.
7 min
Evolution of commercially available artificial intelligence in radiology: a follow-up on peer-reviewed evidence of 179 products
Ignas Bernardus Houben, Zwolle / Netherlands
Author Block: N. Antonissen1, I. B. Houben2, O. Tryfonos3, M. De Rooij1, K. G. Van Leeuwen4; 1Nijmegen/NL, 2Zwolle/NL, 3Amsterdam/NL, 4De Bilt/NL
Purpose: To investigate changes in peer-reviewed evidence on commercially available radiologic artificial intelligence (AI) products from 2020 to 2023.
Methods or Background: A comprehensive review of the literature published between January 2015 and March 2023 of CE certified radiological AI products (according to www.healthairegister.com) was performed. Complying with the previous systematic review, this follow-up study categorized the publications according to the hierarchical model of efficacy: from technical and diagnostic accuracy (levels 1 and 2) to impacts on clinical decision-making and patient outcomes (level 3-5) or socio-economic impact (level 6).
Results or Findings: By March 2023, 91 vendors were identified, offering a total of 179 products, with 120 of these (67%) having peer-reviewed evidence, compared to 36% in 2020. In 2023, there were 662 publications on these 120 products, compared to 237 publications on 36 products in 2020. An increase (22 to 25%) was found in publications focusing on technical or potential clinical efficacy. The majority of publications described the diagnostic accuracy of the product (level 2), although relatively showing a decrease (55 to 52%). For the higher levels of efficacy (level 3-6) the respective contribution to the total remained the same as 2020 (23%).
Conclusion: While there is an increase in the amount of publications validating AI products, the majority of publications continue to describe the lower levels of efficacy. This suggests that even though the field has been maturing, we still have limited knowledge and evidence of the clinical impact of AI products in radiology.
Limitations: Several products have a high number of publications, which may cause them to be overrepresented in the total.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable
7 min
Flexible Deep Learning MR Image Enhancement with Performance Monitoring
Srivathsa Pasumarthi Venkata, Santa Clara / United States
Author Block: Z. Zhou, C. Arnold, H. Gandhi, P. Gulaka, A. Shankaranarayanan, S. Pasumarthi Venkata; Menlo Park, CA/US
Purpose: Deep learning (DL) MR image enhancement allows scan time reduction while maintaining the diagnostic quality. However, its performance may deteriorate over time. This study aims to develop an adaptive image enhancement DL model and investigates a non-reference-based metric without human annotation for performance monitoring.
Methods or Background: A single DL model with a ConvNeXt backbone was trained on 3027 paired MR data. High-quality images were enhanced by a commercial algorithm as targets. Low-quality input images were acquired with various acceleration methods (0-80%) for model to learn adaptive enhancement.

The trained DL model was evaluated on another diverse set of 205 cases. Line profiles and region-of-interests (ROIs) were manually labeled for each case. The slope/gradient was extracted from line profiles to measure image sharpness, and signal-to-noise ratio (SNR) was derived from ROIs to evaluate noise level. In addition, gradient entropy (GE) as a non-reference-based metric (lower GE higher quality) was compared with line/ROI based metrics.
Results or Findings: Compared to inputs, over 90% of model outputs achieved 45% SNR increase and 8% sharpness increase. On average, SNR and sharpness were improved by 73% and 27%, respectively. GE measured on outputs was reduced by 0.5% for 95% of test cases. For test cases with >0.5% GE reduction, the Pearson correlation of the relative change between GE and SNR is -0.333 (p < 0.05), and between GE and sharpness is 0.214 (p < 0.05), showing a weak but significant correlation between GE and annotated image quality (IQ) metrics.
Conclusion: The developed DL model can adaptively improve IQ supporting flexible protocol acceleration. Its strong denoising also enables MR scans with higher acceleration/resolution. In addition, gradient entropy can be simply deployed for performance monitoring and mitigate the risk of mis-interpretation.
Limitations: Not applicable
Funding for this study: NIH SBIR grant (R44MH135725)
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable
7 min
Radiologist-Guided Active Learning for Medical Image Segmentation: Moving Beyond the Dice Score to Clinically Relevant Targets
Bernhard Föllmer, Berlin / Germany
Author Block: B. Föllmer, V. Serafimoski, K. Schulze, F. Biavati, M. Bosserdt, M. Dewey; Berlin/DE
Purpose: Deep learning models for medical image segmentation typically require extensive pixel-wise annotations, which are costly and time-consuming. Active learning can mitigate this challenge by labeling only the most informative (i.e., uncertain) cases in multiple annotation and training rounds. However, conventional active learning methods do not account for clinically relevant segmentation targets. This study introduces a radiologist-in-the-loop approach for targeted active learning, to optimize model performance beyond standard metrics like the Dice score, focusing on clinically significant segmentation objectives.
Methods or Background: We propose a targeted active learning framework consisting of four iterative steps: (1) Automated identification of uncertain cases for review by the radiologist, (2) Radiologist selection of cases relevant to predefined clinical segmentation targets, (3) Combined selection of uncertain and clinically relevant cases, and (4) Efficient partial annotation and model retraining. We applied this approach to multi-class segmentation of coronary arteries using the SCCT 18-segment model, evaluating it on CTAs from 300 patients of the DISCHARGE (NCT02400229) and CAD-Man trials. Initial model training was conducted using standard active learning, followed by targeted active learning with three predefined objectives: (1) segmentation of rare vessels (e.g., Ramus Intermedius), (2) segmentation of thin vessels (e.g., R-PDA, R-PLB), and (3) segmentation of heavily calcified segments.
Results or Findings: Our framework demonstrated improved segmentation performance and time-efficiency over standard active learning for the three predefined targets (rare vessels, thin vessels, and calcified segments.
Conclusion: The proposed targeted active learning framework enables more time-efficient, radiologist-guided model training focused on clinically relevant segmentation targets, improving performance beyond conventional accuracy metrics like the Dice score.
Limitations: This framework was evaluated exclusively on coronary artery segmentation in cardiac CT, with only three segmentation targets considered. Broader validation is needed for other anatomical structures and imaging modalities.
Funding for this study: This work was funded by the German Research Foundation through the graduate program BIOQIC (GRK2260, project-ID: 289347353) and the DISCHARGE project (603266-2, HEALTH-2012.2.4.-2) funded by the FP7 Program of the European Commission.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This study does not require any approval of the ethics committee.

Notice

This session will not be streamed, nor will it be available on-demand!

CME Information

This session is accredited with 1 CME credit.

Moderators

  • Emanuele Neri

    Pisa / Italy

Speakers

  • Erik Hermanus Marcellinus Kemper

    Rotterdam / Netherlands
  • Gaby Van Den Wittenboer

    Zwolle / Netherlands
  • Pantelis Gialias

    Linkoping / Sweden
  • Nor-Eddine Regnard

    France
  • Jingyu Zhong

    Shanghai / China
  • Ignas Bernardus Houben

    Zwolle / Netherlands
  • Srivathsa Pasumarthi Venkata

    Santa Clara / United States
  • Bernhard Föllmer

    Berlin / Germany