Research Presentation Session: Breast

RPS 1202 - Artificial intelligence (AI) in breast imaging

March 1, 08:00 - 09:00 CET

7 min
Adding artificial intelligence (AI) case malignancy scoring in a breast screening programme to reduce screen-reading workload: a retrospective study
Andrea Nitrosi, Reggio Emilia / Italy
Author Block: A. Nitrosi, P. Giorgi Rossi, L. Verzellesi, N. Cucurachi, R. Vacondio, C. Campari, M. Bertolini, P. Pattacini, M. Iori; Reggio Emilia/IT
Purpose: The purpose of this study was to evaluate a strategy of integrating AI mammography case malignancy score (AI-CMS) to reduce breast screen-reading workload avoiding human second reading for mammograms with low AI-CMS.
Methods or Background: We retrospectively analysed 31,747 consecutively collected screening exams from Reggio Emilia breast screening program (BSP), including 92 proven tumours and 5 pending diagnoses, to assess decision to recall (RD), recall rate (RR) and tumour detection of two simulated integrated AI and human reading protocols (ProFound AI 2D system iCAD Inc.). iCAD AI-CMS is a relative score representing the AI algorithm’s confidence that a case is malignant in a 0% to 100% scale.
To estimate the potential reduction in the numbers of human readings, iCAD acts as a reader C1 recalling women with AI-CMS greater than a predefined threshold (10%/15%/20%). If the radiologist – reader RH1 - disagrees with iCAD, the case undergoes to another radiologist RH2 and to a third radiologist arbitration RH3 in case of human disagreement (standard screening protocol).
Results or Findings: Assuming respectively 10%/ 15%/ 20% AI-CMS threshold, RD for C1 was 49.4%/ 37.7%/ 29.7%, for RH1 4.6%/ 4.6%/ 4.6%, for RH2 8.7%/ 10.6%/ 12.5% and for RH3 69.9%/ 69.7%/ 69.1%. The final RR was 3.70%/ 3.62%/ 3.58% versus actual RR of 3.86%.
This corresponds to 48,975/ 45,180/ 42,785 versus 65,097 total human readings (corresponding to human workload reduction of 24.8%/ 30.6%/ 34.3%). There’s no increase in false negative with 10% and 15% thresholds, whereas using the 20% threshold results in an additional false negative.
Conclusion: Adding AI-CMS support to a standard screening scenario could result in a substantial lower screen reading workload, a modest decrease in RR without any additional false negatives.
Limitations: The study only examines data from the Reggio Emilia BSP.
Funding for this study: This study was partially supported by the Italian Ministry of Health—Ricerca Corrente
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Compliance with Ethical Standards Institutional Review Board approval was not required because it is a Clinical Audit about a technical development. This study was conducted in accordance with the routine quality assurance procedures established by the Local Health Authority for its screening programs. The Reggio Emilia Cancer Registry, which routinely collects the screening history of each case of breast cancer, has been approved by the Provincial Ethic Committee.
7 min
An artificial intelligence tool to empower junior radiologists in breast cancer screening; AI as a second pair of eyes in mammography reading
Mehran Arab Ahmadi, Tehran / Iran
Author Block: N. Ahmadinezhad, N. Sadighi, R. Ghavami Modegh, M. Arab Ahmadi, M. Rahmani, A. Arian, H. Dashti, H. R. Rabiee, M. Gity; Tehran/IR
Purpose: The aim of this study was to assess whether an artificial intelligence (AI) application can assist non-expert radiologists in improving their performance in detecting the possibility of cancer using digital mammography in an adjunctive workflow.
Methods or Background: A retrospective study was conducted using 2060 digital mammography (DM) images of 515 women from 2018-2022, including 120 positive and 910 negative breasts, with four junior radiologists participating in the study. Radiologists independently reviewed and interpreted each case without AI assistance. Immediately after submitting their initial interpretations, they were provided with AI-generated interpretations to support their analysis and evaluation of the cases. Armed with this additional information, the radiologists had the opportunity to revise and resubmit their interpretations based on their expertise and insights from the AI system. Radiologists' performance before and after receiving AI assistance was compared using AUC, sensitivity, and specificity metrics.
Results or Findings: According to our findings, the integration of AI technology resulted in a notable enhancement in the performance of junior radiologists. The area under the curve (AUC) improved significantly from 0.812 to 0.837, sensitivity increased from 75.0% to 92.9%, and balanced accuracy enhanced by 3.63%. Additionally, AI proved to be highly beneficial for radiologists in identifying previously missed lesions across various types, including mass, calcification, distortion, and asymmetry.
Conclusion: AI can improve the diagnostic capabilities and detection rates of radiologists with less than five years of experience, enhancing their medical imaging performance.
Limitations: One of the limitation of this study is that we did not assess the effect of AI on final recall rates, which would require a live large-scale survey with a normal distribution to yield reliable results.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The clinical study was conducted under the Research Ethic Certificate No. IR.TUMS. IKHC.REC.1399.490 issued by Tehran University of Medical Sciences on Feb. 24, 2021.
7 min
Is it worth reading low-risk breast cancer screening mammograms as determined by an artificial intelligence (AI) system? A prospective, population-based study for DM and DBT (AITIC trial)
Esperanza Elías Cabot, Cordoba / Spain
Author Block: E. Elías Cabot1, S. Romero Martin1, J. L. Raya Povedano1, A. Rodriguez Ruiz2, M. Álvarez Benito1; 1Cordoba/ES, 2Nijmegen/NL
Purpose: The purpose of this study was to prospectively evaluate AI for safe workload reduction by excluding low risk cases for human reading and applying double reading to the rest in screening with digital mammography (DM) and digital breast tomosynthesis (DBT).
Methods or Background: Participants in a breast cancer screening programme in Córdoba, Spain, (women, aged 50-71) are included in this prospective study and imaged with either DM or DBT. Two reading strategies are independently applied to each exam: Double blind and non-consensual reading of all exams (control arm) and an AI-based triaging (intervention arm), where an AI system (Transpara, ScreenPoint Medical) evaluates the cancer risk of all exams. Cases identified by AI as low risk (operating point pre-defined to yield approximately 70% of exams in this category) are automatically assessed as negative, while cases with intermediate and elevated risk are double read with AI-support. Readers are randomly assigned to each reading and blinded to other reading outcomes. We hypothesise that an AI-based screening workflow allows for substantial workload reduction and non-inferior cancer detection rate (CDR) and recall positive predictive value (PPV).
Results or Findings: Between March 2022 and June 2023, 19243 women participated. AI-based triaging, reading only 6583 exams (the 34% of the total scored by AI as intermediate and elevated risk), achieved superior CDR compared to double-reading of all cases (CDR 6.7/1000, 130 cancers vs. 5.9/1000 (114 cancers), p= 0.017), non-inferior recall PPV (12.6% [10.6-14.8%], 130/1032, versus 12.0% [10.0-14.3%], 114/947, p= 0.699), and increased RR (5.4% [5.0-5.7%], versus 4.9% [4.6-5.2%], p= 0.016).
Conclusion: AI-based triaging, excluding low risk mammograms from human reading, leads to a substantial reduction in reading workload in breast cancer screening without negatively affecting performance.
Limitations: The results include 70% of the target population (27000 women). We expect to complete the study in February 2024.
Funding for this study: Funding was received from the SEDIM foundation grant, to the value of 20.000 euros.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approval granted on 30.03.2021.
7 min
Correlating breast lesions in tomosynthesis CC and MLO views using artificial intelligence (AI)
Sarah Maier Friedewald, Chicago / United States
Author Block: S. M. Friedewald1, A. Dsouza2, C. Parghi3, A. Kshirsagar2; 1Chicago, IL/US, 2Marlborough, MA/US, 3Houston, TX/US
Purpose: The purpose of this study was to evaluate a deep learning (DL) model for matching regions of interest (ROI) corresponding to the same lesion on tomosynthesis craniocaudal (CC) and mediolateral oblique (MLO) views
Methods or Background: A CC-MLO lesion correlation system (CMCS) was developed to automatically match ROIs flagged by an AI breast lesion detection algorithm (Genius AI Detection, Hologic) in both CC and MLO views. The system combines geometric information with similarity between pairs of ROIs to assign a lesion correlation score. ROI pairs above a pre-defined threshold are presented to the reader for potential workflow enhancement.
864 consecutive subjects with biopsy proven malignant cancers were collected retrospectively under an IRB approved protocol from two large multi-centre breast imaging networks and one breast imaging facility at an independent cancer centre. Ground truth was determined by an expert using image available data. The pairs of ROIs flagged by the AI algorithm on malignant lesions were analysed by the CMCS system and compared with ground truth to estimate accuracy.
Results or Findings: Out of 864 patients with biopsy-proven malignancies, 614 lesion ROI pairs identified by experts were detected by the AI algorithm on both views. Analysis of these by CMSC resulted in 555 correctly matched pairs, resulting in an overall accuracy for all findings of 90.4% (95% CI: 88.1, 92.7) for biopsy-proven cancer cases.
Conclusion: The CC-MLO lesion correlation system was able to correctly match pairs of ROIs in CC and MLO views over 90% of the time for biopsy proven malignant lesions that were correctly flagged by the AI algorithm. This matching algorithm can be used to assist radiologists in triangulating one-view findings in the orthogonal view.
Limitations: This study is retrospective and not performed in a clinical setting.
Funding for this study: Funding was received from Hologic Inc.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable.
7 min
Evaluating performance of an artificial intelligence (AI) detection system on prior screening tomosynthesis studies of breast cancer patients
Sarah Maier Friedewald, Chicago / United States
Author Block: S. M. Friedewald1, B. Shi2, C. Parghi3, A. Kshirsagar2; 1Chicago, IL/US, 2Marlborough, MA/US, 3Houston, TX/US
Purpose: The purpose of this study was to evaluate an AI detection system in identifying breast cancer in up to two prior screening tomosynthesis studies in patients with biopsy-proven cancer detected during their most recent screening examinations.
Methods or Background: Tomosynthesis screening and diagnostic studies with one or two associated prior screening examinations acquired between 2014 and 2021, were consecutively collected from 814 biopsy-proven cancer patients. At least one prior screening study (prior1) was available for 814 patients while two prior studies (prior1 and prior2) were available for 272. In the index exam where the cancer was mammographically detected, the cancer was annotated by an expert using all available data. The annotator also retrospectively reviewed prior1 and prior2 examinations and annotated corresponding lesions if visible irrespective of actionability. The AI algorithm (Genuis AI Detection 2.0, Hologic) independently analysed tomosynthesis examinations and marked potentially malignant findings with a score corresponding to the overall level of suspicion. Study-level sensitivity was calculated by comparing the location of any AI marks with ground truth.
Results or Findings: Sensitivity for cancer in 814 cases was 90.7% (738/814, 95% CI:88.6%-92.6%) in index studies. Sensitivity was 89.9% for studies with non-calcified malignancies and 92.6% for studies with malignant calcifications. Sensitivity for retrospectively visible findings amongst prior1 studies was 68.5% (341/498, 95%CI:63.1%-74.2%) and amongst the prior2 studies was 48.8% (79/162, 95%CI:39.9%-57.6%). For the 272 patients having 2 prior studies, average AI case score was 34.7 (SD:28.7) for prior2 studies, 40.9 (SD: 26.8) for prior1 study, and 63.8 (SD 24.7) for index studies diagnosed with biopsy-proven cancers.
Conclusion: This AI system can assist in identifying cancer on prior mammograms interpreted as normal. The temporal increase in case score for each study potentially correlates with cancer progression.
Limitations: This is a retrospective study.
Funding for this study: Funding was received from Hologic, Inc.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable.
7 min
AI detection of interval cancers: does size, grade, and time since screening affect sensitivity?
Muzna Nanaa, Cambridge / United Kingdom
Author Block: M. Nanaa, T. van Nijnatten, N. Stranz, S. Carriero, N. Payne, I. Allajbeu, E. Giannotti, R. Manavaki, F. Gilbert; Cambridge/UK
Purpose: The objective of this study was to evaluate AI detection of interval cancers (IC) on screening mammograms by tumour size, grade, and time since screening.
Methods or Background: Two radiologists (8 and 3–13 years’ experience) classified 488 ICs (2011–2018) as visible or non-visible on screening mammography. Tumour volume doubling time (TVDT) was calculated for visible cancers [TVDT=ln (2). Δt/3. (ln d1-ln d2)], with the median TVDT for grade and receptor status of visible cancers used as a surrogate to estimate cancer size for non-visible cancers [T(SS)=T(SD)×e-(ln(2)/TVDT)×Δt], T(SS): tumour size screening, T(SD): tumour size diagnosis. The sensitivity of a commercial AI algorithm was analysed by tumour size, grade, receptor status, and time from screen to diagnosis at its default threshold for cancer detection (score 10).
Results or Findings: Median screening size was 12 mm (IQR 9–18) for visible cancers (280/488), with median estimated size 2.65 mm (IQR 1.26–5) for non-visible cancers (208/488).
AI detected 58.2% (163/280) of visible and 30.7% (64/208) of non-visible cancers, p<0.001. AI localised 58.4% (31/53) of grade 1, 46.3% (103/222) of grade 2, 43.2% (87/201) of grade 3 cancers, p=0.14, 49.3% (195/395) of ER-positive cancers and 31.7% (27/85) of ER-negative cancers, p=0.003. The median time to interval was 666 days (IQR 405–895) for localised cancers and 708 days (IQR 480–929) for non-localised, p=0.057.
The median size was 13 mm (IQR 9–19) and 12 mm (IQR 8–17) for localised and non-localised visible cancers, p=0.027, and 3.15 mm (IQR 1.93–5.51) and 2.27 mm (IQR 0.97–4.25) for non-visible cancers, p=0.002, respectively. Sensitivity for cancers <5mm, 5–9.9mm, and >=10mm was 33.3% (1/3), 49.4% (43/87), 62.6% (119/190) for visible; 27% (42/155), 32.2% (10/30), 54.5% (12/22) for non-visible.
Conclusion: AI is more likely to detect larger, ER-positive cancers, with a trend towards grade 1.
Limitations: The limitations of this study were the single site; tumour size was estimated in 42.6% of cases.
Funding for this study: This research was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014) and the CRUK early detection programme grant. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. Council for At-Risk Academics (Cara) funded the research fellowship for M.N. (award no. 210211). We would like to thank the company for taking part in this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This retrospective study used anonymised mammograms from two National Health Service Breast Screening Programme (NHSBSP) centres under ethical approval [Health Research Authority Research Ethics Committee (HRA REC) 20/LO/0104, HRA Confidentially Advisory Group (CAG) 20/CAG/0009, and Public Health England (PHE) Research Advisory Committee (RAC) BSPRAC_090].
7 min
How much has AI improved over the last five years? A benchmark evaluation of different versions of an AI mammography interpretation system
Alejandro Rodriguez Ruiz, Nijmegen / Netherlands
Author Block: A. Rodriguez Ruiz, A-K. Brehl, N. Karssemeijer, I. Sechopoulos, R. M. Mann; Nijmegen/NL
Purpose: The study aimed to retrospectively evaluate the breast cancer detection performance of different versions of the same mammography AI system developed since 2018.
Methods or Background: Two enriched datasets (A: 60 exams, 24 cancers, read by 107 radiologists; B: 60 exams, 20 cancers, read by 73 radiologists) and one consecutively collected double-read screening dataset (22,961 exams with 370 cancers, including 163 screen-detected, 48 interval, and 159 next-round screen-detected) were gathered. All exams and radiologists are part of the Dutch breast cancer screening program.
Each exam was processed by four versions of the same AI system (v1.3, v1.5, v1.7, and Beta-2023, Transpara, ScreenPoint Medical), developed between 2018-2023. All exams were independent from the AI development process. The sensitivity of AI was compared to that of the radiologists using the average radiologist specificity on each dataset, using parametric T-tests.
Results or Findings: In dataset A, the average radiologist specificity and sensitivity was 92% (CI: 91%-94%) and 83% (CI: 81%-85%). At this specificity, AI system versions v1.3-v1.7 achieved sensitivities ranging from 62% to 79%, while Beta-2023 achieved 88% sensitivity (CI: 68%-97%, P=0.92). In dataset B, the average radiologist specificity and sensitivity was 80% (CI: 78%-83%) and 85% (CI: 83%-88%). AI v1.3-v1.7 versions achieved 75%-85% sensitivity, while Beta-2023 achieved 95% sensitivity (CI: 75%-99%, P=0.73).
In the screening dataset, the average radiologist specificity was 97.7% (CI: 97.4%-97.8%). Due to interval and next-round cancers, average radiologist sensitivity was only 40% (CI: 35%-45%). AI Beta-2023 achieved a sensitivity of 48% (CI: 43%-53%), statistically higher sensitivity than average single reading (P=0.002).
Conclusion: AI systems are continuously improving performance. In the evaluated system, the breast cancer detection performance has improved over time to surpass that of an average radiologist.
Limitations: An identified limitation was that the study includes data from a single country.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Data was collected following IRB waiver at the institution.

This session will not be streamed, nor will it be available on-demand!