Research Presentation Session: Breast

RPS 1902 - Introduction of artificial intelligence in breast screening

March 1, 12:30 - 13:30 CET

  • ACV - Research Stage 1
  • ECR 2025
  • 7 Lectures
  • 60 Minutes
  • 5 Speakers

Description

7 min
Impact on quality performance indicators after implementing AI in a breast cancer screening program in Germany
Alejandro Rodriguez Ruiz, Nijmegen / Netherlands
Author Block: K. Hamm1, A. Rodriguez Ruiz2, T. Jordan1, B. Vetter1, C. Engel3, C. Entrup4, M. Engelke5; 1Chemnitz/DE, 2Nijmegen/NL, 3Leipzig/DE, 4Koblenz/DE, 5Hamburg/DE
Purpose: To evaluate breast cancer screening quality indicators after implementation of an AI system for support reading mammograms.
Methods or Background: Two prospective and consecutive collected cohorts of women attending breast cancer screening with mammography in a region of Germany where identified, just before and after implementation of an AI decision support system to aid radiologists reading mammograms (Transpara version 1.7, ScreenPoint Medical). Before AI implementation, all mammograms were double read without AI. Afterwards, mammograms were double read using AI as concurrent decision support. All mammograms were acquired with same devices (Siemens Mammomat Inspiration). A total of X radiologists assessed the exams in this screening program.

Screening quality indicators (cancer detection rate, recall rate, false positive rate, PPV2) were compared in the cohorts of women before and after implementation of AI using multivariate logistic models adjusted for age, breast density, and interval from previous examination.
Results or Findings: 59.676 women attending screening before AI implementation (2020-2021) and 58.546 women after AI implementation (2022-2023) were included in the analysis. Average age was 60 years old in both cohorts. Average number of months between rounds was 838 days in the no-AI cohort and 815 days in the AI cohort.

After implementing AI, cancer detection rate increased (349 screen-detected cancers, 6.0/1000 vs 286 screen-detected cancers, 4.8/1000, p=0.01), recall rate remained stable (2.5% vs 2.6%, p=0.29), false positive rate was reduced (1.9% vs 2.1%, p=0.002), and PPV2 increased (69%, 349/509 vs 60%, 286/477, p=0.009).
Conclusion: Implementing AI to support radiologists reading mammograms in a breast cancer screening program in Germany is safe and effective, improving cancer detection rates and reducing false positives.
Limitations: This prospective study has a non-paired non-randomized design.
Funding for this study: None.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approved by local ethics committee.
7 min
Breast Cancer Characteristics after the introduction of Artificial Intelligence-supported double-reading in a Mammography Screening Program: comparison of baseline and subsequent rounds
Claudia Maria Weiss, Villorba / Italy
Author Block: C. M. Weiss, E. Di Gaetano, E. Cattarin, R. Cerniato, G. Soppelsa, I. Vinci; Treviso/IT
Purpose: To analyse the prognostic factors of breast cancers (BCs) detected with artificial intelligence-supported double-reading (AI-DR) and attempt to determine the long-term impact of these changes, particularly on possible overdiagnosis.
Methods or Background: AI-DR was applied to all digital screening mammograms (DSM) from November 2021 to June 2024: 99320 in the AI-baseline-screen (AIBS) and 21237 in the AI-subsequent-screen (AISS). The collected data were compared by retrospective analysis to determine whether AIBS screen-detected BCs differed from AISS. We used the Z-test to compare the proportions of the data between AIBS and AISS.
Results or Findings: With a total of 1093 screen-detected BCs (AIBS: 944/99320; AISS: 149/21367), the study revealed a decrease (-26.6%) in the cancer detection rate (CDR) per 1000 in AISS compared to AIBS (6.97vs9.5). The recall rate (RR) was lower (-42.3%) in AISS than in AIBS (1.8%vs3.1%). No significant differences were found in the percentage of invasive BCs (AIBS82%vsAISS81.9%) and in situ BCs (AIBS18%vsAISS18.1%). Higher percentages of luminal BCs were observed in the AIBS than in AISS (91.1%vs82.6%), while in AISS, there were higher percentages of high-grade (AIBS25.3%vs AISS33.6%), HER2positive (AIBS 8.3%vsAISS12.4%) and triple-negative BCs (AIBS2.6%vsAISS 5%).
Conclusion: The reduction of RR and CDR in AISS aligns with the expectation of later screening focusing on disease onset or progression cases. In AISS, compared to AIBS, more aggressive BCs were detected, while less aggressive BCs were reduced. This might suggest an improved performance in the second round of screening, with a positive impact on the reduction of overdiagnosis.
Limitations: Data on interval and advanced cancers are lacking, which would allow for an analysis of long-term clinical outcomes.
Funding for this study: No funding
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not requested
7 min
Multicenter Analysis of AI Assessed Mammography Technologist Positioning Variability Between Breast Screening Programs
Georgia Spear, Park Ridge / United States
Author Block: G. Spear1, L. R. Margolies2, J. Payne3, S. E. E. Iles3, J. Seely4, N. Sharma5, S. H. Heywang-Köbrunner6, T. W. W. Vomweg7, M. Abdolell3; 1Chicago, IL/US, 2New York, NY/US, 3Halifax, NS/CA, 4Ottawa, ON/CA, 5Leeds/UK, 6Munich/DE, 7Koblenz/DE
Purpose: Variability in mammographic positioning quality, both between breast screening programs (BSPs) and across different positioning errors, presents a challenge to establishing standardized mammography quality service delivery. Although high-quality mammography helps ensure diagnostic accuracy, and training can enhance image quality, there is a lack of supporting population-level empirical data. This study aims to quantitatively assess mammography technologists’ positioning error rates across BSPs.
Methods or Background: The MAMMO.IQ study encompassed a total of 249,817 screening mammograms acquired between December 1, 2019, and February 28, 2021, from seven BSPs across North America and Europe. The positioning errors assessed included: exaggeration, portion cut off, posterior tissues missing, nipple not in profile, too high on IR, pectoralis shape/position, sagging, IMF missing/obscured, PNL difference, and compression. The Coefficient of Variation (CV) assessed variability in error rates, (1) between BSPs, and (2) between positioning errors. The within-BSS CV for each unmet positioning criterion was computed using rates for all technologists within a BSS.
Results or Findings: Images acquired by 310 technologists were analyzed. Over/under compression had the lowest variability (CV=16.44%) indicating consistent practices. Too High on IR exhibited the highest variability (CV=71.52%) reflecting a high level of inconsistencies. The MLO Inadequate Pectoralis Length had a CV of 50.45%, representing the median level of variability.
Conclusion: This study highlights variability in mammography technologists’ positioning errors between and within BSPs. While some positioning criteria show consistent practices, others may benefit from improved standardization. Understanding inconsistencies in mammography service delivery helps identify opportunities to standardize positioning practices and reduce variability, leading to more equitable, high-quality care and fewer positioning errors.
Limitations: Missing data on technologist experience, staffing, and COVID-19 response measures limits the understanding of factors driving disparities.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethics approvals were obtained from participating BSPs (NSHA-REB#1026590).
7 min
Characteristics of Breast Cancers before and after the introduction of Artificial Intelligence-supported double reading in a Mammography Screening Program
Claudia Maria Weiss, Villorba / Italy
Author Block: C. M. Weiss, E. Di Gaetano, E. Cattarin, R. Cerniato, G. Soppelsa, I. Vinci; Treviso/IT
Purpose: Screening supported by artificial intelligence (AI) increased cancer detection compared to screening without AI. However, it is still unclear whether the additional cancer detection improves outcomes or leads to overdiagnosis of breast cancers (BCs).
Methods or Background: From January 2019 to October 2021, 134259 women underwent digital screening mammography (DSM) with human-double-reading (HDR) and from November 2021 to June 2024, 131406 DMS with AI-supported HDR (AI-HDR) DM. The collected data (cancer detection rate [CDR], recall rate [RR] and tumour characteristics) were compared by retrospective analysis to determine whether the BCs detected by screening differed between HDR and AI-HDR. We used the Z-test to compare the proportions of the data between HDR and AI-HDR.
Results or Findings: With a total of 2044 screen-detected BCs (HDR:938/134259; AI-HDR:1106/131406), the study revealed a significant increase (+20.5%) in CDR per 1000 with AI-HDR compared to HDR (8.42vs6.99 per 1000, respectively). The RR was lower (-14.9%) with AI-HDR than with HDR (2.6%vs3.1%). The AI-HDR showed the following differences in BCs rates compared to HDR: lower for invasive BCs (82.1%vs84.1%), and higher for in situ BCs (17.9%vs15.9%); higher moderate-grade BCs (65.5%vs61.5%), and lower high-grade BCs (26.4%vs 27.9%); higher luminal BCs (88.2%vs83.2%), lower HER2positive (8.7%vs13.2%), and lower triple negatives (2.9%vs3.6%).
Conclusion: It can be concluded that the use of AI-HDR produced statistically significant differences in detecting various tumour subtypes compared to HDR. In particular, it seems to have increased the detection of less aggressive BCs and reduced unnecessary recalls.
Limitations: Data on interval and advanced cancers are lacking, which would allow for an analysis of long-term clinical outcomes.
Funding for this study: No fundings
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not requested
7 min
Artificial intelligence as an initial reader for double reading in breast cancer screening: A prospective initial study of 32,822 mammograms of the Egyptian population
Yasmin Mohamed Nada, Cairo / Egypt
Author Block: S. A. Mansour, R. M. Kamal, M. M. Gomaa, E. Sweed, S. Hussien, E. Abdalla, Y. M. Nada, G. Mohamed, A. F. I. Moustafa; Cairo/EG
Purpose: Although artificial intelligence (AI) has potential in the field of screening of breast cancer, there are still issues. It is vital to make sure AI doesn't overlook cancer or cause needless recalls. The aim of this work was to investigate the effectiveness of indulging AI in combination with one radiologist in the routine double reading of mammography for breast cancer screening.
Methods or Background: The study prospectively analyzed 32822 screening mammograms. Reading was performed in a blind-paired style by i) two radiologists and ii) one radiologist paired with AI. A heatmap and abnormality scoring percentage were provided by AI for abnormalities detected on mammograms. Negative mammograms and benign-looking lesions that were not biopsied were confirmed by a 2-year follow-up.
Results or Findings: Double reading by the radiologist and AI detected 1324 cancers (6.4%); on the other side, reading by two radiologists revealed 1293 cancers (6.2%) and presented a relative proportion of 1·02 (p<0·0001). At the recall stage, suspicion and biopsy recommendation were more presented by the AI plus one radiologist combination than by the two radiologists. The interpretation of the mammogram by AI plus only one radiologist showed a sensitivity of 94.03%, a specificity of 99.75%, a positive predictive value of 96.571%, a negative predictive value of 99.567%, and an accuracy of 99.369% (from 99.252% to 99.472%). The positive likelihood ratio was 387.260, negative likelihood ratio was 0.060, and AUC “area under the curve” was 0.969 (0.967 to 0.971).
Conclusion: AI could be used as an initial reader for the evaluation of screening mammography in routine workflow. Implementation of AI enhanced the opportunity to reduce false negative cases and supported the decision to recall or biopsy.
Limitations: The study is a single institute work.
Funding for this study: No source of funding.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study has been approved by the Baheya Charity Hospital research center.
7 min
Simulating single reading for high-risk examinations in the randomized controlled Mammography Screening with Artificial Intelligence trial (MASAI)
Viktoria Josefsson, Malmö / Sweden
Author Block: V. Josefsson, D. Schmidt, H. Sartor, O. Hagberg, K. Lang; Malmö/SE
Purpose: To assess the value of double reading of high-risk examinations in the MASAI trial.
Methods or Background: In the randomised controlled MASAI trial AI supported screening was compared to standard double reading. AI was used to triage exams to single or double reading depending on malignancy risk and as detection support. Of the 53 048 participants in the intervention arm, 3800 exams were high risk and underwent double reading while the remaining exams underwent single reading. In this retrospective study, we assessed the relative performance in the intervention arm, comparing simulated single and factual double reading of high-risk exams and its effect on cancer detection, recalls, and false positives. Cancers solely detected by the second reader were described.
Results or Findings: The simulated single reading scenario resulted in 8.9% (308 vs. 338) fewer detected cancers and 5.9% fewer recalls (1045 vs. 1110) compared to the factual outcome in the intervention arm. Corresponding simulated vs. factual rates were 5.8/1000 vs. 6.4/1000 for cancer detection, 2.0% vs. 2.1% for recalls and 1.4% vs. 1.5% for false positive. Of the 30 cancers solely detected by the second reader, 24 (80.0%) were invasive and 21 (70.0%) were classified as T1. Of the invasive cancer, 23 (95.8%) were lymph-node negative and 8 (33.3%) non-luminal A, of which four were triple negative.
Conclusion: Double reading of high-risk exams improved cancer detection without unduly increasing false positives. The additional cancers detected were mostly small, lymph-node negative invasive cancers, including those of significant prognostic subtypes. These findings support the continued use of double reading for high-risk exams in AI-supported screening.
Limitations: Single-institution trial.
Retrospective simulation.
Funding for this study: The Swedish Cancer Society
Regional Cancer Centers in Collaboration
Lund University ALF-funds
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The Swedish Ethical Review Authority
2020-04936
2023-026848-02
7 min
Using prior mammograms to improve specificity of an AI system for breast cancer detection: a large-scale retrospective multi-site validation
Alejandro Rodriguez Ruiz, Nijmegen / Netherlands
Author Block: A. Rodriguez Ruiz1, S. Pires1, R. Peeters1, G. Rodriguez-Esteban1, D. Sperber1, C. De Wolf2, J. L. Raya Povedano3, S. Romero Martin3, R. Mann1; 1Nijmegen/NL, 2Geneva/CH, 3Cordoba/ES
Purpose: To investigate how the use of prior mammograms impacts breast cancer detection performance of an AI system.
Methods or Background: Mammograms from women attending three European screening programs were collected based on availability of prior images and at least 2 years follow-up, including original radiologists assessments.

Each case was analyzed by a breast cancer detection AI product (Transpara, ScreenPoint Medical, v2.1), resulting in two cancer risk scores: using as input the current mammogram alone and using prior mammograms. AI specificity was compared between using priors or not, matching the single radiologist sensitivity . Subsequently, the combination of a single radiologist and AI was modelled and compared to double human reading. P-values using McNemar and binomial confidence intervals were computed.
Results or Findings: 37,148 cases were included (20,300 from Switzerland, 916 from Spain 15,932 from The Netherlands), with 1,034 recalled cases (2.8%), 247 screen-detected cancers (6.6/1000), and 59 interval cancers (1.6/1000). 56% of cases had 1 prior mammogram, 44% had 2 or 3. Images were acquired with Hologic, Siemens, GE, Planmed and Philips machines.

At the average sensitivity of a single radiologist (71.2%), AI specificity increased when using prior images, from 98.1% (98.0-98.2%) to 98.8% (98.7-98.9%), representing a 37% reduction in false positives (from 1.9% to 1.2%, P<0.001).

Combining AI using priors with a radiologist achieved comparable sensitivity (83.0% vs 82.0%, P=0.66) and higher specificity (96.3%, 96.0-96.5%) than double human reading before consensus (95.4%, 95.2-95.6%), representing 20% fewer false positives (P<0.001). The improved specificity was higher for cases with breast density C/D (+0.8%, P<0.001) than for A/B (+0.3%, P=0.005).
Conclusion: A higher specificity was achieved by a breast cancer detection AI system using prior mammograms, potentially offering better aid to radiologists in breast cancer screening.
Limitations: Retrospective design.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable

Notice

This session will not be streamed, nor will it be available on-demand!

CME Information

This session is accredited with 1 CME credit.

Moderators

  • Simone Schiaffino

    Lugano / Switzerland

Speakers

  • Alejandro Rodriguez Ruiz

    Nijmegen / Netherlands
  • Claudia Maria Weiss

    Villorba / Italy
  • Georgia Spear

    Park Ridge / United States
  • Yasmin Mohamed Nada

    Cairo / Egypt
  • Viktoria Josefsson

    Malmö / Sweden