Research Presentation Session: Artificial Intelligence & Machine Learning & Imaging Informatics

RPS 2405 - AI in breast cancer screening

March 3, 11:30 - 12:30 CET

7 min
Bimodal CADx assessment in mammography and tomosynthesis: a standalone study for breast screening
Hubert Beaumont, Nice / France
Author Block: H. Beaumont, A. Iannessi, S. Pacilè, T. Louis; Nice/FR
Purpose: Digital mammography (FFDM) is the standard for breast cancer screening. Digital breast tomosynthesis (DBT), compared to FFDM, enhances cancer detection and reduces unnecessary biopsies. Despite DBT's adoption, critical questions remain—higher radiation, time, cost, and clinical benefits, particularly for systematic breast screening. In the era of AI CADx (Computer-Aided Diagnosis) for breast screening, one unresolved question is the role of bimodal algorithms in predicting cancer risk and offering guidance when opinions differ, and we aim to understand this.
Methods or Background: We analysed a cohort of 1071 screened patients for breast cancer who underwent both mammography and tomography. A CADx software assigned a score of malignancy to each tumour, allowing to compute the joint distribution of the paired mammography/tomography scorings. From the joint distribution, we defined areas of “perpendicular diagnosis” (PD) as the areas of highly discordant scoring. We evaluated the potential of systematic reclassifications of perpendicular scoring through sensitivity and specificity both for tumoural mass and calcifications.
Results or Findings: We observed a modest inter-modality agreement, indicated by a kappa of 0.19 (95% CI: 0.18; 0.22). PD scoring was present in 32.7% (95% CI: 29.7; 35.8) of mass cases and 38.6% (95% CI: 30.1; 47.6) of calcification cases. Specific reclassification rules significantly increased mass sensitivity from 0.74 (95% CI: 0.71; 0.77) to 0.80 (95% CI: 0.77; 0.83), resolving 29.1% of PD cases. For calcifications, reclassification addressed 12% of discrepancies, improving specificity from 0.85 (95% CI: 0.76; 0.91) to 0.86 (95% CI: 0.78; 0.92).
Conclusion: Our analysis revealed a significant proportion of CADx algorithmic discrepancies. In cases where FFDM classified masses as benign while DBT CADx suggested malignancy, DBT CADx aided decision-making. Conversely, for calcifications initially assessed as negative with tomosynthesis, FFDM CADx proved valuable. Exploring alternative reclassification methods is essential.
Limitations: No limitations were applicable for this study.
Funding for this study: No funding was obtained for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable for this study.
7 min
Are AI-detected interval cancers actionable for recall in a real screening setting? An informed review of 120 interval cancer cases with high AI scores in breast screen Norway
Henrik Wethe Koch, Stavanger / Norway
Author Block: H. W. Koch1, M. Larsen2, S. Hofvind2; 1Stavanger/NO, 2Oslo/NO
Purpose: Retrospective studies have suggested that using artificial intelligence (AI) systems in breast cancer screening might help us detect 30-40% of interval cancers. However, it is uncertain whether the AI-markings match the location of the tumour on diagnostic mammograms, and if the findings are actionable for recall in a real screening setting, which, is the aim of this study.
Methods or Background: In 2022, we conducted a retrospective study comparing the performance of an AI-system with independent double reading by radiologists according to cancer detection. The AI-system (Transpara v.1.7.0) scored mammograms from 1-10 based on risk of malignancy. 42% (120/289) of the interval cancers had an AI-score of 10. In this study, four radiologists did a consensus review of the interval cancers with AI-score 10 and compared AI-markings with cancer location on diagnostic mammograms. Interval cancers were classified as false negative, minimal sign (actionable or non-actionable) or true negative. Mammographic breast density was classified as BI-RADS a-d.
Results or Findings: Of 120 interval cancers with AI-score 10 (group1), 77.5% (93/120) had AI-markings matching the cancer location (group2). 20.8% (25/120) had AI-markings matching cancer location and were considered actionable for recall (false negative/minimal sign actionable) (group3). Density distribution as percentage of all 289 interval cancers:
Group1: a: 17% (1/6), b: 42% (46/110), c: 41% (56/138), d: 49% (17/35), Group2: a: 17% (1/6), b: 33% (36/110), c: 38% (53/138), d: 9% (3/35), Group3: a: 0% (0/6), b: 10% (11/110), c: 10% (14/138), d: 0% (0/35).
Conclusion: Our results indicate that the true effect of AI in screen reading regarding earlier detection of interval cancers is still uncertain. Although 49% of interval cancers in extremely dense breasts had AI-score 10, none were considered actionable for recall in an informed consensus review.
Limitations: Retrospective study design and informed consensus review was the limitation of this study.
Funding for this study: No funding was obtained for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Regional Committee for Medical and Health Research Ethics (#13294).
7 min
Integration of artificial intelligence (AI) in double-read population-based mammography screening: simulated replacement of one reader and beyond
Mohammad Talal Elhakim, Odense / Denmark
Author Block: M. T. Elhakim1, S. Wordenskjold Stougaard1, O. Graumann2, M. Nielsen3, O. Gerke1, L. B. Larsen1, B. Schnack Brandt Rasmussen1; 1Odense/DK, 2Aarhus/DK, 3Copenhagen/DK
Purpose: The aim of this study was to compare the accuracy and feasibility of three AI-integrated screening scenarios compared to double reading with arbitration (combined reading).
Methods or Background: A study sample of 249,402 consecutive screening mammograms representative of an entire screening population was obtained from the Region of Southern Denmark. The AI system Lunit INSIGHT MMG v. (Lunit Inc.) processed all mammograms. In Scenario 1, AI replaced first reader. In Scenario 2, AI replaced second reader when it agreed with the decision of the original first reader. In Scenario 3, AI was applied as a standalone triage tool, with AI replacing both readers for assessing low- and high-risk screenings, while moderate-risk screenings were assessed by the original combined reading. AI cut-offs were chosen partly based on a previous validated threshold and partly based on maintaining a workload reduction at around 50% for comparability across scenarios. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), recall rate (RR), and arbitration rate (AR) were calculated.
Results or Findings: AI cut-off scores were 80.99% (Scenarios 1 and 2), and <3.36% and ≥95.29% for low-risk and high-risk screenings, respectively (Scenario 3). Compared to combined reading, AI-integrated screening showed no statistically significant difference in any outcome measures other than a higher AR by +1.0% (p<0.0001) in Scenario 1, and a higher sensitivity by +1.5% (p=0.001) and lower AR by -0.8% (p<0.0001) in Scenario 3. In Scenario 2, AI-integrated screening had statistically significantly lower sensitivity (-4.5%; p<0.0001), NPV (-0.1; p=0.001), RR (-0.6%; p<0.001), and AR (-1.5; p<0.0001), and higher specificity (+0.6%; p<0.0001) and PPV (+4.7; p<0.0001).
Conclusion: Partial or full replacement of one or both readers in double reading with AI seems feasible without markedly affecting accuracy in screening.
Limitations: Retrospective design and correlated radiologist readings with reference standard were the limitations of this study.
Funding for this study: Region of Southern Denmark funded this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Danish National Committee on Health Research Ethics (identifier D1763009).
7 min
RSNA 2023 screening mammography breast cancer detection AI challenge results
Yan Chen, Nottingham / United Kingdom
Author Block: G. Partridge1, M. Vazirabad2, R. Ball3, H. Trivedi4, F. C. Kitamura5, H. Frazer6, R. M. Mann7, L. Moy8, Y. Chen1; 1Nottingham/UK, 2Chicago, IL/US, 3Bar Harbor, ME/US, 4Atlanta, GA/US, 5São Paulo/BR, 6South Yarra/AU, 7Nijmegen/NL, 8New York, NY/US
Purpose: Artificial intelligence (AI), used alongside human readers, in breast cancer screening could revolutionise the screening workflow. The RSNA hosted the 2023 screening mammography breast cancer detection AI challenge, where participants were invited to develop AI algorithms to interpret mammograms. The purpose was to assess the performance of submitted algorithms and explore the potential for improving performance by combining high-performing algorithms.
Methods or Background: Teams were provided a training-set of 11,913 2-view 2D digital mammogram (2DDM) screening cases, from two institutions (US and Australia) for AI training. AI performance was evaluated using an independent test-set of 5,415 2DDM cases from the same source. Cancer cases were pathology proven and non-cancer cases had at least 1-year of normal follow-up. Algorithms were ranked in the challenge using the pF1 accuracy score (incorporating sensitivity and PPV). In the current study, all algorithms were assessed independently. In addition, combined models were constructed from top-ranked algorithms.
Results or Findings: One thousand six hundred and eighty seven teams participated in the challenge, each submitting their own algorithm. Median specificity and NPV were high across algorithms (98.7% and 98.5%, respectively), yet median cancer detection was low (sensitivity: 27.6%, PPV: 36.9%), with a median recall rate (RR) of 1.7%. The highest ranked algorithm (as per pF1) had a RR of 1.5%, specificity of 99.5%, NPV of 99.0%, sensitivity of 48.6%, and a PPV of 64.6%. Combining the top 3 and top 10 ranked algorithms demonstrated an increased RR (2.4% and 3.5%), while achieving a marked improvement in sensitivity (60.7% and 67.8%), while specificity remained more stable (98.7% and 97.8%, respectively).
Conclusion: Variation in performance of submitted AI algorithms to the RSNA challenge is substantial. Combining the highest-performing algorithms demonstrated improvement in performance.
Limitations: Relatively small size of evaluation test-set, low cancer prevalence (but screening setting) limit this study.
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable for this study.
7 min
Improving breast cancer recurrence forecasts: combining multi-time-point mammography and medical reports
Chunyao Lu, Amsterdam / Netherlands
Author Block: C. Lu1, X. Wang1, L. Han1, T. Zhang1, Y. Gao1, T. Tan1, R. M. Mann2; 1Amsterdam/NL, 2Nijmegen/NL
Purpose: Predicting the risk of recurrence post-breast cancer surgery continues to be a challenging task, despite having access to complete medical records. The purpose of this study was to develop a deep learning model based on multi-time-point mammogram breast images and medical reports to predict the risk of postoperative recurrence of breast cancer.
Methods or Background: At a large academic medical center, we collected consecutive digital screening mammograms and medical reports in 3188 patients between January 1, 2000, and December 31, 2020. Our model synergistically integrates risk factors derived from multi-time-point mammograms with patients’ preoperative clinical data. We compared our method with commonly used machine learning methods based on clinical data and image-based deep learning models.
Results or Findings: We conducted a comprehensive comparison between our model and common machine learning models as well as deep learning methods, demonstrating that our model attained the highest AUCs in three datasets of patients who relapsed at different times, with scores of 0.72, 0.76, and 0.83 within 5, 10, and 20 years respectively. We discovered that while traditional risk factors are significant contributors, our model enhances the accuracy of predicting cancer recurrence risk by deducing potential risk factors from multi-time-point mammography images.
Conclusion: Our model underscores the advantages of incorporating complete and consecutive medical data into predictive algorithms, enhancing accuracy in forecasting recurrence and informing health policies for post-surgical treatment of breast cancer patients.
Limitations: This study needs to be validated in more external datasets.
Funding for this study: This study is supported by the Chinese Scholarship Council Studentship and the Guangzhou Elite Program (TZ-JY201948).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Not applicable for this study.
7 min
Microwave breast cancer screening and early detection: SAFE clinical study
Aleksandar Janjic, Istanbul / Turkey
Author Block: A. Janjic, I. Akduman, M. Cayoren, A. Yurtseven, O. Bugdayci, M. E. Aribal; Istanbul/TR
Purpose: The SAFE (Scan and find early) system is a microwave breast cancer imaging (MBI) device designed for non-invasive and non-ionising breast cancer screening and early detection. This technology relies on distinguishing dielectric properties between cancerous and healthy tissue to provide valuable clinical insights. This study aims to evaluate SAFE's capability to precisely identify lesions within the patient's breast.
Methods or Background: This study exclusively enrolled patients scheduled for biopsy, following approval from the ethics committee of Marmara University School of Medicine. The approach utilised to identify breast lesions was based on analysing the difference in backscattered signals between healthy and cancerous breast tissue. Furthermore, we employed a machine learning approach, specifically utilising extreme gradient boosting (XGBOOST), to discern the presence of cancerous tissue within the breast. In tandem, a qualitative microwave imaging method was employed to precisely pinpoint the location of the tumour.
Results or Findings: Our dataset comprised 394 samples, with 284 originating from healthy tissue and 110 from cancerous tissue. Among the 110 cancerous cases, 69 were identified as benign and 41 as malignant findings. The devised detection model exhibited commendable performance, with a sensitivity, specificity, and accuracy of 91%, 92%, and 92%, respectively.
Conclusion: The findings from our study demonstrate the capability of our MBI system in detecting a significant majority of breast lesions. This suggests that SAFE holds promise in positively influencing breast cancer screening and early detection, given its non-invasive and safe characteristics. We are in the process of planning further clinical studies to validate the results obtained.
Limitations: The study's limitations encompass a modest sample size, potential selection bias from exclusive enrollment, reliance on a single machine learning algorithm, and the suggestion for external validation to fortify the findings.
Funding for this study: This research was funded by the Scientific and Technology Research Council of Turkey (TUBITAK) grant number 120N388 and by the European Union’s Horizon 2020 research and in-novation program under the Marie Sklodowska Curie grant agreement No. 764479.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Ethics committee of Marmara University School of Medicine (Protocol number: 70737436-050.06.04; Date of approval: Jun. 09, 2014). All protocols and procedures were in accordance with both institutional and national ethical standards in research and with the World Medical Association Declaration of Helsinki.
7 min
Clinical and operational benefits of artificial intelligence (AI) in prospective UK breast screening service evaluation
Gerald Lip, Aberdeen / United Kingdom
Author Block: G. Lip1, A. Ng2, C. F. De Vries1, L. A. Anderson1, R. Staff1, G. Fox2, C. Oberije2, P. Kecskemethy2; 1Aberdeen/UK, 2London/UK
Purpose: The objective of this study was to assess the clinical and operational benefits of AI workflows in a live breast screening service.
Methods or Background: An AI system has been evaluated in a prospective paired design at NHS Grampian to assess the impact of using AI in a combination of workflows that use AI as an: (1) Independent reader when it agrees on ‘no recall’ with Reader 1 (i.e. Double Reader Triage (DRT) workflow) or when it agrees on ‘recall’/’no recall’ with Reader 1 (i.e. Supporting Independent Reader (sIR)) to provide workload savings, and (2) Extra Reader (XR) to triage positives for additional arbitration (not recalled by standard double reading (DR)) to provide an opportunity for increased cancer detection. Over 10,000 non-opt-out women who had a four-view FFDM processed by the AI were included. All screens were human double-read, maintaining the standard of care. DRT and sIR performance outcomes were simulated, while XR outcomes were measured from live use. Planned analyses included assessment of recall rate, cancer detection rate, arbitration rate, positive predictive value, sensitivity, specificity, and workload savings for DR, DRT, sIR, and XR and combination workflows DRT+XR and sIR+XR, as well as non-inferiority and superiority tests for the combination workflows against DR.
Results or Findings: Interim analyses have identified that cancers from at least six women have been found through the XR process, resulting in a relative 13% (6/47) increase in cancer detection (0.9/1000 absolute) compared to DR. The sIR and DRT workflows are expected to provide significant workload savings that offset the additional arbitration workload that XR requires by 6x.
Conclusion: Interim analyses indicate clinical and operational benefits when using AI in breast cancer screening, which will be confirmed and presented in the final evaluation analysis (expected Q1 2024).
Limitations: Single-site evaluation limits this study.
Funding for this study: NHSE/AAC/NIHR AI in Health and Care Award funded this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Ethics review not required for service evaluations
7 min
Adding artificial intelligence (AI) case scoring in a breast screening programme to overcame delay in most probably true positive cases: a retrospective study
Andrea Nitrosi, Reggio Emilia / Italy
Author Block: A. Nitrosi, R. Vacondio, C. Coriani, L. Verzellesi, N. Cucurachi, C. Campari, M. Bertolini, P. Pattacini, M. Iori; Reggio Emilia/IT
Purpose: The objective of this study was to retrospectively evaluate an AI case score based strategy to anticipate the readings of most probably true positive cases to keep reading times within two weeks as would be required by local regulations.
Methods or Background: We analyzed 32,012 2D mammography screening exams including 71 proven tumours and 61 pending diagnosis, consequentially acquired in Reggio Emilia Breast Screening Program (BSP) starting from October 2022 to July 2023 and elaborated by iCAD Inc. ProFound AI 2D system. ICAD Case Scores represent the AI algorithm’s relative confidence that a case is malignant on a scale of 0% to 100%. A pool of nine radiologists performs double blinded plus arbitration readings; each reads about 200 mammograms per 6 hours work-shift, exams are evaluated in “virtual sessions” defined by mammograph unit and exam date. Due to a known chronic shortage of medical personnel, the readings can exceed local regulations standards of 15 days (even to more than 30). An AI-based prioritised reading protocol was elaborated analyzing the tumour incidence in function of Case Score.
Results or Findings: Among cases respectively having a Case Score evaluated greater than 30%, 40%, 50% and 60% the percentage of total tumours found were 89%, 85%, 69% and 61%, while the percentage of cases to be read were 20%, 13.8%, 9.8%, 5.4%. It’s worth noting that prioritizing the readings of “case score based virtual session” comprehending only exams with case score >40% (4406 over 32012) per week (an average of 174 exams per week), the majority of the true positives (85%) women could be recalled within very short time.
Conclusion: This scenario would not undermine the reading screening workflow while guarantee early diagnosis and hopefully nor influence readers competence.
Limitations: Cases refer only to Reggio Emilia BSP limiting this study.
Funding for this study: This study was partially supported by the Italian Ministry of Health—Ricerca Corrente.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Compliance with Ethical Standards Institutional Review Board approval was not required because it is a Clinical Audit about a technical development. This study was conducted in accordance with the routine quality assurance procedures established by the Local Health Authority for its screening programmes. The Reggio Emilia Cancer Registry, which routinely collects the screening history of each case of breast cancer, has been approved by the Provincial Ethic Committee.

This session will not be streamed, nor will it be available on-demand!