Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 1905 - Building the AI breast imaging service: from deployment to clinical practice

March 7, 12:30 - 13:30 CET

6 min
Improving time to breast cancer detection: AI identification of screening-detected cancers on prior-year digital breast tomosynthesis
Manisha Bahl, Cambridge / United States
Author Block: M. Bahl, H. Kim, K-S. Kim, S. Do, L. Lamb; Boston, MA/US
Purpose: To evaluate whether a commercial AI-based computer-assisted detection/diagnosis (CADe/x) algorithm for digital breast tomosynthesis (DBT) could have identified screening-detected breast cancers on the prior year’s DBT examination, potentially enabling earlier diagnosis.
Methods or Background: This retrospective study included consecutive women with screening-detected breast cancers on DBT from 2016 to 2019 who also had a prior screening DBT within 18 months interpreted as negative. Both the index examination (the screening examination on which cancer was detected by the radiologist) and the prior examination were analyzed using a commercial CADe/x algorithm (Genius AI® Detection 2.0; Hologic, Inc.). AI scores ranged from 0-100, with values ≥22 considered positive per vendor recommendation. A breast imaging radiologist reviewed all AI-positive cases to determine whether AI marks corresponded to the site of the subsequently diagnosed cancer.
Results or Findings: Four hundred women (mean age 64±10) met inclusion criteria. On the index mammogram, AI detected and correctly localized 88.8% (355/400) of cancers. Among these 355 AI-detected cases, 44.5% (158/355) were also identified and correctly localized by AI on the prior-year exam (radiologist-negative). Thus, 39.5% (158/400) of patients in the cohort may have benefited from earlier cancer detection. Of the 158 cancers flagged by AI on the prior exam, 79.7% (126/158) were invasive. Among these invasive cancers, 73.8% (93/126) were grade 2-3, and 9.5% (12/126) were node-positive at diagnosis.
Conclusion: A commercial AI algorithm retrospectively identified and localized nearly 40% of screening-detected breast cancers on prior DBT - approximately one year before radiologic diagnosis - suggesting a potential role for AI in facilitating earlier detection, even outside classic false-negative or interval cancer categories.
Limitations: The limitations of the study are single-center, retrospective design and use of one vendor/algorithm with a vendor-defined threshold, which may limit generalizability.
Funding for this study: Funding was provided by Hologic, Inc. The authors, none of whom are employees of Hologic, maintained full control over the data and the submitted information.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This retrospective Health Insurance Portability and Accountability Act (HIPAA)-compliant study was granted an exemption from the requirement for written informed consent by the institutional review board at the Massachusetts General Hospital (Protocol #: 2023P003130).
6 min
An Automated Deep Learning Framework for Breast Cancer Segmentation and Survival Prediction
Shushan Dong, Shanghai / China
Author Block: K. Wang1, S. Wang1, S. Huang2, J. Xie3, S. Dong2, M. Xu1, R. Zhang1; 1Hangzhou/CN, 2Beijing/CN, 3Shanghai/CN
Purpose: Manual lesion segmentation on MRI and separate prognostic modeling are labor-intensive. This study aims to develop and validate FA-SurvNet, a fully automated survival analysis network framework that integrates deep-learning segmentation and end-to-end survival prediction to streamline breast cancer prognosis.
Methods or Background: This retrospective study enrolled 573 female breast cancer patients from two medical centers who underwent preoperative MRI for prognostic evaluation. We developed an FA-SurvNet model integrating an nnU-Net based on the tumor segmentation task with a CNN-Cox regression network based on the survival analysis task for joint breast cancer lesion delineation and prognosis prediction. Segmentation performance was quantified using standard metrics including dice similarity coefficient (DSC), positive predictive value (PPV), and sensitivity (SEN). To validate the predictive performance of the FA-SurvNet model, we constructed two baseline survival models using conventional Cox regression. The predictive performance and clinical utility of the three models were comprehensively evaluated through Harrell's concordance index (C-index) for discriminative ability
Results or Findings: The nnU-Net-based segmentation model achieved high performance in both training (DSC: 0.85, PPV: 0.85, SEN: 0.88) and testing cohorts (DSC: 0.84, PPV: 0.87, SEN: 0.84). FA-SurvNet model demonstrated good performance in breast cancer recurrence risk prediction with a c-index of 0.88 and 0.84 in the training and testing cohorts, respectively. Comparative analysis revealed the superior performance of the FA-SurvNet model over the conventional radiomics model (training c-index: 0.84; testing c-index: 0.75) . Time-dependent AUC values of the FA-SurvNet model for 5-year recurrence-free survival (RFS) prediction in the training cohort was 0.90 (95%CI: 0.84-0.94).
Conclusion: By combining lesion segmentation and survival prediction, the FA-SurvNet model simplifies the imaging prognostic diagnosis research process, provides an efficient tool for clinical decision-making, and realizes the complete automation of breast cancer prognosis.
Limitations: None
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: None
6 min
Comparative Study of Radiologists’ Diagnostic Performance With and Without Artificial Intelligence Support for Breast Lesion Detection in 2D Mammography
Louis Lassalle, Paris / France
Author Block: L. Lassalle1, J. Ventre2, V. Marty2, L. Clovis2, N. Nitche2, J. Hadchiti3, N-E. Regnard1, A-L. Hermann4, E. Kotter5; 1Lieusaint/FR, 2Paris/FR, 3Villejuif/FR, 4Lyon/FR, 5Freiburg Im Breisgau/DE
Purpose: Mammography is a key domain where AI may improve early breast cancer detection. The study compared the diagnostic performance of radiologists with and without support from a new AI system (BreastView, Gleamer) in detecting breast lesions.
Methods or Background: We retrospectively collected mammographies from three imaging centers across France (2018-2023), sourced from three manufacturers (Hologic, Siemens, GE). Eligible patients were women over 18 who underwent 2D mammography with CC and MLO views and had either biopsy or 18-month follow-up. Poor-quality exams were excluded.
Ground truth was determined by an experienced breast radiologist who had access to the entire patient file, including anterior and posterior mammographies and, when available, digital breast tomosynthesis, MRI, ultrasound, clinical reports, and biopsy reports for cancer cases.
Nine radiologists participated: five “non-subspecialists” (250 and 750 mammographies/year), and four “subspecialists” (>1000 mammographies/year). They annotated all visible lesions and assigned malignancy scores from 0 to 100. Each completed two reading sessions, unaided and with AI support, separated by a 12-month washout.
Results or Findings: The dataset included 319 patients (age: 58 ± 13 years): 159 with a malignant biopsy-proven lesion, 39 with only benign lesions confirmed by biopsy or 18-month follow-up, and 121 with no lesion confirmed by 18-month follow-up.
The stand-alone AI achieved an AUC of 0.911 [0.881–0.942] for malignant lesion detection, outperforming the mean radiologist AUC of 0.801 [0.769–0.832]. With AI assistance, radiologists significantly improved their AUC (0.880 [0.865–0.896]), sensitivity (+18.7 points, p<.001), and specificity (+2.6 points, p=.042). Gains were lower for subspecialists but still significant.
Conclusion: The AI system showed robust performance, enhancing radiologists’ diagnostic accuracy and supporting its potential as a clinical decision-support tool.
Limitations: The study was retrospective, with an enriched cancer dataset, and readers had no access to clinical information.
Funding for this study: Gleamer
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Should AI results be disclosed in mammography reports? A randomized survey study of patient responses to concordant and discordant interpretations
Giovanni Irmici, Milan / Italy
Author Block: G. Irmici, F. Pesapane, C. Depretto, L. Nicosia, G. Della Pepa, A. Rotili, S. Santicchia, G. P. Scaperrotta, E. Cassano; Milan/IT
Purpose: To assess how disclosing artificial intelligence (AI) results, particularly discordant findings, affects patient trust, anxiety, follow-up intentions, and attitudes toward AI in mammography. The study also evaluated whether a brief explanatory note mitigates adverse
reactions.
Methods or Background: A cross-sectional randomized experimental survey was conducted among 600 women (mean age 55.4 ± 6.8 years) undergoing mammography in two academic breast imaging centers in Milan, Italy, between January 2023 and June 2024. Participants
were randomized into four hypothetical BI-RADS 1 scenarios: Radiologist Only (control), AI No-Flag (AI concordant with radiologist), AI Flagged (AI discordant false-positive), and AI Flagged + Explanation (discordant AI with contextual information).
Outcomes included trust (0–100 scale), worry, second-opinion intent, legal action intent, and AI approval. Statistical analyses involved ANOVA, chi-square tests, and logistic regression with Bonferroni correction.
Results or Findings: Discordant AI disclosure significantly reduced trust in the radiologist (73.0 vs. 90.1; p<0.001), increased anxiety (58.0% vs. 16.0%; OR=15.4), second-opinion intent (50.0% vs. 8.7%; OR=10.2), and legal action consideration (60.7% vs. 38.7%; OR=2.49). Adding explanatory context significantly alleviated these effects (e.g., anxiety: 25.3%; OR=0.26). AI approval remained high (>85%) across AI scenarios.
Conclusion: Disclosing discordant AI results negatively impacts patient trust and anxiety, prompting increased intentions for second opinions and legal actions. Providing explanatory context mitigates these adverse effects, supporting transparent but contextually informed disclosure as essential for patient communication strategies in AI-integrated mammography.
Limitations: Limitations of our study include the hypothetical scenario nature. The explanation tested in this study focused on FP rates; future work should explore the effects of alternative or additional types of contextual information. Additionally, we measured stated intentions rather than observed behaviours.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Institutional Review Board approval was obtained.
6 min
AI performance on an interval cancer mammography dataset and its role in audit triage
Yan Chen, Nottingham / United Kingdom
Author Block: A. Taib1, K. Wells1, I. Darker1, J. James1, N. Sharma2, Y. Chen1; 1Nottingham/UK, 2Leeds/UK
Purpose: Interval cancers (IC, cancers arising between screening rounds) are an important performance indicator in breast cancer screening. The NHSBSP requires a minimum of two readers to classify the previous screening mammograms into one of three categories: (1) Satisfactory, (2) Satisfactory with learning points and (3) Unsatisfactory. We assessed whether artificial intelligence (AI) could be applied retrospectively to reliably classify IC to help standardise this process.
Methods or Background: IC cases were collected from January 2015 to December 2024 from two large UK screening centres. An AI model (Lunit) analysed all cases and provided a continuous malignancy probability score (0-100) for each breast. A ROC analysis was performed to determine the ability of AI to distinguish Category 1 cases separately from 2 and 3 using malignancy scores and thresholding, where cases with scores less than the threshold were classified as Category 1, and those over classified as 2 and 3.
Results or Findings: 409 IC cases were included which were previously classified by readers at each centre (79% Category 1, 20% Category 2 and 1% Category 3). At a threshold of 0.5, AI classified 65 cases as Category 1 (63 correct, 2 misclassified Category 2 cases). At a threshold of 10, AI classified 229 cases as Category 1 (206 correct, 23 misclassified Category 2 cases). No Category 3 cases were misclassified as Category 1 by either threshold.
Conclusion: AI shows promise as a triage tool to improve the IC audit process by reliably classifying Category 1 cases, which make up most interval cancer, while still correctly identifying most category 2 & 3 cases for human review.
Limitations: Predominantly Category 1 cases, with limited Category 3. Limited number of centres may restrict generalisability.
Funding for this study: Funding was provided by Lunit Inc.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the local research and ethics committee of St James's University Hospital, Leeds, England.
6 min
Beyond Screening: AI Performance in Mammographic Breast Cancer Recurrence Detection
Zahra Aghdam, Amsterdam / Netherlands
Author Block: Z. Aghdam, X. Wang, A. Portaluri, J. Kroes, J. Teuwen, K. Lipman, R. Mann; Amsterdam/NL
Purpose: To evaluate the diagnostic performance of a commercial mammography-based AI system for recurrence detection in the post-operative breast, addressing a critical evidence gap.
Methods or Background: This retrospective single-center diagnostic accuracy study included patients after breast-conserving surgery undergoing digital mammography for follow-up between 2004 and 2022. Recurrence was defined as ipsilateral malignancy irrespective of tumor biology. Reference standard was pathology. The most recent mammogram ≤3months before recurrence was analysed. Mahalanobis-distance matching was used to select one mammogram per control(≥24months negative follow-up), aligning the age and time from the primary tumor between cohorts.
4,235 exams(384 recurrence-3,851 controls)were included. BI-RADS categories were extracted from reports; sensitivity and specificity calculated at BI-RADS≥3. For each exam, AI(Transpara version-2.1.0)yielded highest region score at breast level(0–100) and a risk category(low-intermediate-elevated). AI sensitivity was assessed at radiologists’ specificity. A combined AI+BI-RADS score was derived via logistic regression. Risk category changes from preceding year exams were evaluated using chi-square with Bonferroni-adjusted pairwise comparisons(p<0.05).
Results or Findings: Radiologists’ sensitivity, specificity were 0.75[0.68-0.81] and 0.98[0.98-0.99], respectively. AI sensitivity at matched specificity was 0.28[0.21-0.34] and at Youden-derived threshold was 0.77[0.71-0.84]. AUC of AI was 0.88[0.85-0.90]. Combined AI+BI-RADS had a sensitivity of 0.77[0.70-0.83] while improving AUC to 0.95[0.93-0.97](vs AI alone,p<0.001).
AI risk category transitions over one year differed significantly between cases and controls(132 recurrences vs 2985 controls),(χ²=336.0, p<0.001). Recurrences were more likely to transition to a higher risk category(76.5% vs 14.8%,RR=5.17[4.55–5.87]), and less likely to remain in the same category(22% vs 65.4%,RR=0.34[0.24–0.46]) or have decreased risk(1.5% vs 19.8%,RR=0.08[0.02–0.30]).
Conclusion: Sensitivity of stand-alone AI for recurrence detection was lower than for radiologists at the same specificity. Increase in risk category was, however, associated with local recurrence.
Limitations: Single-center retrospective design, use of varied mammography systems, assessment of only one AI model
Funding for this study: Health Holland
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: IRBd21-060
6 min
Comparing the Performance of Top Ranked AI Models from the RSNA 2023 Screening Mammography Breast Cancer Detection AI Challenge to Commercial AI Models
Yan Chen, Nottingham / United Kingdom
Author Block: G. J. W. Partridge1, H. Jupp1, T. Zhang2, X. Wang2, R. Mann2, L. Moy3, Y. Chen1; 1Nottingham/UK, 2Nijmegen/NL, 3New York, NY/US
Purpose: In 2023, the RSNA hosted a Screening Mammography Breast Cancer Detection Artificial Intelligence (AI) Challenge, where participants were invited to develop AI models for cancer detection in mammograms. Here we assess the performance of the Top 7 ranked models compared to 4 commercially available AI products, using a multi-national dataset.
Methods or Background: A large multi-national evaluation dataset was sourced from the USA, UK, Australia and Brazil as part of the 2023 RSNA Challenge, consisting of 20,365 cases. Cases consisted of 2-view 2D mammography screening cases, where cancers were pathology proven and non-cancer cases had at least 1-year of normal follow-up. The Top 7 ranked RSNA challenge AI models (as per pF1 score in the RSNA Challenge https://www.kaggle.com/competitions/rsna-breast-cancer-detection/leaderboard), and 4 commercially available AI products (Lunit, Transpara, Therapixel, iCAD) will analyse all cases, and the performance of all AI models will be compared.
Results or Findings: In the current dataset including 4,811 cases (USA and Australia), equating to 9,622 single breast exams, 193 (2.0%) had biopsy proven cancer, and 9,429 were non-cancer (98.0%). The Top 7 Challenge algorithms achieved AUCs between 0.903 and 0.947, and the commercial AI product achieved an AUC of 0.933. The Top ranked Challenge model AUC was not different to the commercial AI AUC (Delong’s method: P = .18).
Conclusion: The Top 7 ranked challenge algorithms performed very well compared to the commercial product on the current dataset sourced from the USA and Australia. Inclusion of data from the UK and Brazil will enable an analysis of AI generalisability and robustness across different populations.
Limitations: Relatively small size of evaluation test-set; low cancer prevalence (but screening setting).
Funding for this study: N/A
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Large-scale AI implementation in Radiology: technical, operational, and behavioral adoption patterns across a 20-center Swiss Imaging Network
Benoît Rizk, Villars-Sur-Glane / Switzerland
Author Block: S. Morozov, N. Heracleous, B. Rizk, C. Thouly, B. Dufour, O. Novarina; Sion/CH
Purpose: To evaluate technical efficiency and radiologist adoption of AI tools across musculoskeletal, chest, breast, and neurologic imaging in Switzerland’s largest radiology network, establishing benchmarks for processing, workflow benefit, and user acceptance.
Methods or Background: A retrospective analysis included 397,694 radiological studies processed by AI at 20 Swiss centers between January 2022 and June 2025. Survey data (53/58 radiologists, 91.4% response) captured usage frequency, trust, satisfaction (Net Promoter Score), and desired improvements. Time metrics included AI processing latency, report turnaround with vs. without AI, and the proportion of results available at reporting. AI implementation covered multiple modalities and anatomies.
Results or Findings: Musculoskeletal AI accounted for 67.8% of use (trauma radiography 37%). Breast, chest, and brain AI demonstrated 76% specialty adoption. Frequent AI use was reported by 66.1% of radiologists, primarily mid-career. Perceived benefits were error protection (71.7%) and time savings (56.6%). Key requests: faster processing (48%) and automated report integration (34%). NPS varied: +86.7 for bone age, +65.2 orthopedic radiography, −55.3 spine MRI, and +38.5 for chest CT (78% use). Technical bottlenecks were data transfer for radiography/mammography (2.1–3.5min) and AI computation for brain/spine MRI (8.2min). Only 21.6% of AI results were ready before reporting; 5% were missing at validation. AI reduced report times by 12–57% in targeted modalities (knee MRI 16→14min, trauma radiography 7→3min, bone age 3.5→1.6min).
Conclusion: AI tools produced measurable workflow gains (12–57% turnaround reduction) and 90%+ adoption, but improvements in speed, integration, and interoperability are required. Addressing technical and workflow barriers and focusing on user feedback will be critical for effective specialty adaptation.
Limitations: AI evaluation focused on technical and operational metrics, with indirect assessment of clinical outcomes. Results reflect the experience of a mature, multi-site Swiss network, which may limit generalizability to other settings.
Funding for this study: Self-funded
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Efficient external validation of AI models using nested case–control designs: Application to breast cancer risk prediction
Jim Peters, Nijmegen / Netherlands
Author Block: J. Peters1, D. Sergeev2, C. Jacobs1, D. Van Der Waal1, M. Broeders1; 1Nijmegen/NL, 2Heidelberg/DE
Purpose: External validation of AI models in large cohorts is critical but computationally and resource-intensive, especially when multiple models are compared or evaluations repeated for quality control. Nested case-control (NCC) designs could enable faster and more sustainable validation by reducing data requirements, but it is unclear whether they yield unbiased estimates across all recommended performance metrics. We investigated whether an NCC design with weighted estimators can accurately evaluate discrimination and calibration of breast cancer risk prediction models.
Methods or Background: We used data from the PRISMA study, a Dutch population-based breast cancer screening cohort including 38,742 women with questionnaire data and mammograms. Two prediction models for 5-year breast cancer risk were evaluated: the Tyrer–Cuzick model and Mirai, a mammography-based AI model. For the NCC design, all cases were matched to four controls. Weighted estimators were applied for the concordance index (C-index), time-dependent AUC (tAUC), observed/expected ratio (O/E), and calibration slope. To assess sampling variability, NCC samples were drawn 100 times. Averaged estimates of performance metrics were compared with the full cohort.
Results or Findings: There were 571 breast cancer cases after a median follow-up of 4.3 years. Average performance of Tyrer-Cuzick in 100 NCC samples (all cases, 2,284 controls) was: C-index 0.598 (SD 0.007), tAUC 0.587 (0.010), O/E ratio 0.701 (0.010), slope 0.636 (0.041). In the full cohort, results were: C-index 0.585 (95% CI 0.558–0.613), tAUC 0.574 (0.536–0.607), O/E ratio 0.693 (0.625–0.775), slope 0.563 (0.399–0.726).
Conclusion: Using only 8% of the data, the NCC design produced performance estimates closely matching full-cohort results. This approach may enable faster, more resource-efficient external validation, particularly useful for comparing AI models or repeated quality control evaluations.
Limitations: Current results are based on one model; analyses with Mirai will be available at ECR 2026.
Funding for this study: Dutch Cancer Society (KWF7626) and The Netherlands Organisation for Health Research and Development (ZonMw 200500004)
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: CMO Arnhem-Nijmegen reference no. 2014/177