Research Presentation Session: Breast

RPS 1002 - Exploring the role of artificial intelligence in breast imaging

February 27, 14:00 - 15:30 CET

  • ACV - Research Stage 4
  • ECR 2025
  • 12 Lectures
  • 90 Minutes
  • 12 Speakers

Description

7 min
A 10-year image-derived AI risk model for use in primary prevention of breast cancer
Mikael Eriksson, Stockholm / Sweden
Author Block: M. Eriksson1, K. Czene1, C. Scott2, P. Hall1, C. Vachon2; 1Stockholm/SE, 2Rochester, MN/US
Purpose: Image-derived artificial intelligence (AI) risk models have shown significant potential in enhancing breast cancer (BC) screening through short-term risk assessment. A long-term image-derived AI risk model for primary prevention has yet to be developed and externally validated.
Methods or Background: This study utilized a case-cohort approach, including women aged 35-94 recruited between 2009-2017 from population-based screenings in Olmsted County, Minnesota (U.S.), and the KARMA cohort in Sweden. Median follow-up was 10 years, with BCs diagnosed before 5/2022. An image-derived AI risk model, initially developed in a Swedish population, was validated independently in the Olmsted/KARMA cohorts. At study entry, 10-year absolute risks were estimated. Time-dependent discriminatory performance (AUC(t)) and expected-to-observed event ratios (E/O) were calculated.
Results or Findings: The combined Olmsted/KARMA cohorts included 8,721 women, with a mean age of 54.4 years (±10.6) in the subcohort and 1,633 incident BC cases with a mean age of 57.0 years (±10.6). The AI-derived 10-year average risks were 3.83% and 3.14%, with E/O ratios of 0.99 (95%CI 0.94-1.05) in Olmsted and 0.99 (95%CI 0.91-1.08) in KARMA. The 10-year AUC(t) values were 0.70 (95%CI 0.68-0.73) for Olmsted and 0.73 (95%CI 0.69-0.77) for KARMA. Using the U.S. Preventive Services Task Force (USPSTF) guidelines, 41% of cases in KARMA were identified as high-risk, compared to 15% with Tyrer-Cuzick-v8 and 5.1% with BCSC-v3 (p<0.01). Under the National Institute for Health and Care Excellence (NICE) guidelines, these figures were 31%, 7.4%, and 0.2%, respectively.
Conclusion: The 10-year image-derived AI risk model demonstrated strong predictive performance in both U.S. and Swedish case-cohorts, outperforming traditional clinical risk models in KARMA. This AI model holds significant potential for clinical application in primary prevention, targeting up to 40% of BCs.
Limitations: The study population was mainly White women.
Funding for this study: Swedish Research Council
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Mayo Clinic and Olmsted Medical Center Institutional review board and the Swedish Ethical Review Authority
7 min
Cracking the Code: Predicting Pathogenic Mutations in Breast Cancer with Ultrasound Radiomics
Roxana Maria Pintican, Cluj-Napoca / Romania
Author Block: R. M. Pintican, N. Antone; Cluj-Napoca/RO
Purpose: To evaluate the potential of US-based in the prediction of pathogenic mutational status of breast cancer patients, relevant to prophylactic mastectomy recommendations.
Methods or Background: This retrospective study included 73 breast cancer patients tested with multigene panel tests including all seven pathogenic mutations (BRCA1, BRCA2, TP53, PTEN, CDH1, PALB2, and STK11 mutations). US images were acquired prior to any treatment and tumoral and peritumoral areas were used to extract radiomics data. The study population was divided into testing and validation group, each with pathogenic- and non-pathogenic mutation population. Radiomics features were analyzed using machine learning models, alone and in combination with clinical features ( ki67%).
Results or Findings: We observed significant differences in radiomics features between pathogenic- and non-pathogenic mutation driven tumors. Using a three-step feature selection process we develop the prediction models (The Mann-Whitney U test, Spearman Correlation and LASSO Regression); the Rad-score 1 ( tumor) achieved an accuracy of 78.6% in identifying pathogenic mutation carriers, while Rad-score 2 (tumor+peritumoral) increased the model's accuracy to 85%. The Rad-Clin 1 and Rad-Clin 2 achieved 83% and 95% acuracy in predicting mutational status. On validation cohort we obtained the following AUCs: Rad-score 1 = 66%; Rad-score 2 = 91%; Rad-Clin 1 = 58%; Rad-clin 2 = 83%.
Conclusion: Radiomics models based on US images of breast tumors may provide a promising alternative for predicting pathogenic mutation status in BC patients. The highest accuracy was reached when we combined radiomics data extracted from the tumor and peritumoral area. This approach could reduce dependence on costly genetic testing and expedite the diagnostic process.
Limitations: Small sample size
Unicentric study
Funding for this study: No funding
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Retrospective study - the informed consent was waived.
7 min
Do we still need to double read the most suspicious screening mammograms when using AI for decision support? A sub-analysis from the AITIC breast cancer screening prospective trial
Esperanza Elías Cabot, Córdoba / Spain
Author Block: E. Elías Cabot1, A. Rodriguez Ruiz2, J. L. Raya Povedano1, S. Romero Martin1, M. Álvarez Benito1; 1Cordoba/ES, 2Nijmegen/NL
Purpose: To evaluate the differences between single and double reading of the most suspicious mammograms after the introduction of AI in breast cancer screening.
Methods or Background: TThis was a sub-analysis of the AITIC paired prospective trial in the breast cancer screening program in Cordoba, Spain. In this trial, between March 2022 and January 2024, 31,301 women (age 50-71) were included and imaged with either DM or DBT based on equipment availability. Two reading strategies were independently applied to each exam: Double blind and non-consensual reading of all exams (standard strategy) and AI-based triaging (AI strategy), where an AI system (Transpara v1.7, ScreenPoint Medical) evaluated the cancer risk of all exams. Cases identified by AI as Low risk were automatically assessed as negative, while exams with Intermediate or Elevated risk were double read with concurrent AI-support. For the latter group, cancer detection (CDR) and false positive rates (FPR) were compared between single and double reading. P values using McNemar and binomial confidence intervals (CI) were computed.
Results or Findings: The AI strategy, double reading only 36% of the total screening mammograms, resulted in 228 screen-detected cancers (CDR=7.3/1000, CI: 6.4-8.2/1000) and 1,723 recalls (FPR=4.8%, CI: 4.5-5.0%). Should these exams have been single read with AI support, there would have been 190 screen-detected cancers (CDR = 6.0/1000, CI: 5.2-7.0/1000), and 1,082 recalls (FPR = 2.9%, CI: 2.7-3.0%), a -17% (P<0.05) and -42% reduction (P<0.05) with respect to double reading. The standard strategy resulted in 1,501 recalls (FPR=4.2%, CI: 4.0-4.4%) and 198 cancers (CDR=6.3/1000, CI: 5.5-7.2/1000).
Conclusion: After introduction of AI for triage and decision support in screening, increased cancer detection rates were achieved in comparison to standard of care by still double reading a subgroup of the most suspicious exams.
Limitations: Single-site.
Funding for this study: None.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Local IRB approval.
7 min
Benefits and risks of AI use for reviewing negative screening mammograms
Christophorus De Wolf, Onex / Switzerland
Author Block: C. De Wolf1, K. Brändle2, J-L. Bulliard2; 1Geneva/CH, 2Lausanne/CH
Purpose: Introduction:
Breast cancer remains a global health concern, with artificial intelligence (AI) offering promising advancements in improving screening accuracy. Traditional methods, requiring high-volume readings, often lead to fatigue and reading errors. AI addresses these limitations by providing fatigue-free, reproducible results. This study assesses the benefits and costs of AI in detecting high-risk lesions in mammograms initially classified as negative by radiologists.
Methods or Background: Methods:
Risk scores (Transpara® version 1.7.3) were calculated for 54’300 mammograms from a public Swiss screening program (2018–2021). Data included screen detected (n=321) and interval cancers (n=94), lesion location, and double-blind radiologist readings. We included risk score thresholds considered as elevated risk (61 to 90). Key outcomes included additional workload (additional mammograms in consensus conference), avoided false-negative interval cancers (FN-IC, n=39), and increased false-positive (FP) rates. Multivariable logistic regression was used to predict the rise in FP cases across thresholds.
Results or Findings: Results:
The FN-IC rate reduction ranged from 8.3% (threshold 90) to 31.3% (threshold 61), with an additional workload of 2 to 67 extra mammograms per 1,000 participants. Avoiding one FN-IC case required 28 to 242 extra readings, resulting in 12 to 86 additional false positives (FP). FP rates rose by 2.1% to 59.3%, with the workload increasing by a third for every 5-point threshold drop up to 75. With an AI threshold set to 85, the false positive rate increased by 2.3‰ (from 40.5‰ to 42.8‰) and the workload would increase by 6 mammograms /1000 participations.
Conclusion: Conclusions:
AI assistance may enhance mammography sensitivity. However, this comes with a relatively high cost in terms of FP results and additional readings. Therefore, determination of the critical threshold must be context-specific to achieve optimal benefit – risk ratio.
Limitations: Retrospective design.
Funding for this study: No external funding
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: All women signed an informed consent that their anonymized screening data could be used for quality assurance purposes.
7 min
Patient perceptions and attitudes towards the use of AI in the symptomatic breast unit
Sneha Singh, Dublin / Ireland
Author Block: S. Singh, R. P. Crean, H. Briody, R. Bruen, N. Hambly, M. Bambrick, D. Duke, M. Mullooly, N. Healy; Dublin/IE
Purpose: Artificial intelligence (AI) has been evaluated in a number of breast screening settings with favourable results. While there are limited studies looking at patient attitudes to AI in breast screening none have examined perceptions of AI in the symptomatic setting. The aim of this study was to determine attitudes towards AI among patients attending the symptomatic breast unit.
Methods or Background: An anonymous 15 question, voluntary questionnaire was given to all patients attending the symptomatic breast clinic imaging department of Beaumont Hospital from 01/07/2024 to 30/09/2024. Results were collated in a password protected Excel database and descriptive statistics performed. Likert responses were numericised so that mean of 1 denotes strong agreement and 5 denotes strong disagreement.
Results or Findings: Of the 1500 patients who were surveyed, most were aged 40–59 years (62.1%). Almost one-quarter had either a personal (364/1500) or family history of breast cancer (360/1500). 62% (927/1500) had some or strong interest in AI.

Regarding the use of AI in healthcare, 46% agreed it was a good idea, 8% disagreed and 46% were indifferent. There was support for AI assisting radiologists in reading mammograms (Mean (M)=2.43,95% CI:2.39-2.48) but disapproval of AI being the sole reader (M=3.82,95% CI:3.77-3.87). Respondents strongly preferred human radiologists over AI for reading mammograms, even if AI were more efficient (M=1.95,95% CI:1.90-1.99) or more accurate (M=2.17, 95% CI:2.13-2.22). 75% of patients would blame both the AI developer and the human radiologist for an incorrect result. All results were statistically significant (p<0.001).
Conclusion: Respondents hold favourable views towards the use of AI in healthcare. They welcome use of AI as an adjunct for radiologists but disagree with AI being the only reader of their mammogram.
Limitations: N/A
Funding for this study: RCSI seed funding
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approval has been obtained from the hospital audit committee (CA2024/126). Formal ethical approval was not deemed necessary as this is an anonymised, voluntary study.
7 min
The effect of an artificial intelligence decision support system on radiologists’ screening mammography performance and visual search patterns
Jessie Gommers, Nijmegen / Netherlands
Author Block: J. Gommers, S. D. Verboom, M. Broeders, I. Sechopoulos; Nijmegen/NL
Purpose: To investigate the effect of using a commercial artificial intelligence (AI) decision support system on the diagnostic performance and visual search patterns of radiologists interpreting screening mammograms.
Methods or Background: A multi-reader, multi-case study was performed with 12 Dutch screening radiologists interpreting 150 screening mammography examinations (75 normal, 75 malignant). Radiologists read the examinations without and with AI support while an eye tracker recorded their eye movements. AI classified the examinations as low (maximum region scores:<40), intermediate (40-59), medium-high (60-79), or very-high risk (80-100). Radiologists provided a probability of malignancy score (0-100) and recall decision for each examination. The performance under the two reading conditions was compared using the area under the receiver operating characteristics curve (AUC), sensitivity, and specificity through mixed-model analysis of variance. Reading time and eye tracking outcomes were compared by bootstrap resampling (n=20,000).
Results or Findings: The average AUC increased significantly from 0.93 without AI support to 0.97 with AI support (P<.001). There was no evidence of a significant change in sensitivity (81.7% vs 87.2%, P=.06) or specificity (89.0% vs 91.1%, P=.46), although sensitivity tended to increase for AI-classified high-risk examinations (medium-high: 54.9% vs 61.8%, very-high: 89.5% vs 95.6%). Overall reading time did not change significantly (29.4 vs 30.8 seconds, P=.32), but decreased for AI-classified low-risk examinations (25.1 vs 20.1 seconds, P<.001). When using AI, radiologists covered less of the breast area with fixations (11.1% vs 9.5%, P=.005), while spending more time fixating in lesion areas (4.0 vs 5.1 seconds, P<.001).
Conclusion: Reading with an AI decision support system increased radiologists’ screening performance and allowed them to focus more on lesion-specific areas without increasing overall reading time, indicating a more efficient search.
Limitations: Enriched case set and one AI system only.
Funding for this study: aiREAD financed by KWF Dutch Cancer Society and the Dutch Research Council (NWO) Domain Applied and Engineering Sciences (AES), as part of their joint strategic research program Technology for Oncology II. The collaboration project is co-funded by the PPP Allowance made available by Health-Holland, Top Sector Life Sciences & Health, to stimulate public-private partnerships.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The need for ethical approval for this retrospective multi-reader multi-case study was waived by the Research Ethics Committee of Radboud University Medical Center (registration number
2021–13186).
7 min
Mammographic features of false positive AI markings on screening mammograms from BreastScreen Norway
Marit Almenning Martiniussen, Graalum / Norway
Author Block: M. A. Martiniussen1, M. B. Bergan2, J. Gjesvik2, M. Undrum Kristiansen1, S. Hofvind2; 1Graalum/NO, 2Oslo/NO
Purpose: False positive AI markings are an expected challenge when implementing artificial intelligence (AI) in mammographic screening and might contribute to an unsustainable increase in the workload for the radiologists. The aim of this study was to gain knowledge about false positive AI markings from two AI systems on screening mammograms.
Methods or Background: In this retrospective study, 129 385 screening examinations from BreastScreen Norway, performed at Ostfold Hospital Trust, 2008-2018, were run through two AI systems. System A was Lunit INSIGHT MMG version 1.1.7.2, and system B was a non-commercial system, developed by the Norwegian Computing Center and the Cancer Registry of Norway. Each model provided a score on a scale from 0-100, and marked the most suspicious areas. Higher score indicated higher risk of cancer. Two radiologists performed a consensus-based informed review of examinations among those with the 5% highest AI score from both systems, interpreted negative at index screening and without cancer diagnosed at index and two consecutive screening rounds. Mammographic features corresponding to the AI markings were classified according to the Breast Imaging Reporting and Data System (BI-RADS). The results were analyzed using descriptive statistics.
Results or Findings: Among the examinations that met the inclusion criteria (n=252), 120 examinations from 120 women were randomly selected for review. The mammographic feature corresponding to the AI markings was calcifications for 71.7% (86/120) for system A and 67.5% (81/120) for system B, a mass for 12.5% (15/120) for system A and 14.2% (17/120) for system B, while asymmetry accounted for 10.8% (13/120) for system A and 11.7% (14/120) for system B.
Conclusion: Calcifications was the main mammographic feature in screening mammograms with high AI score without diagnosed cancer.
Limitations: No limitations were identified.
Funding for this study: The South-Eastern Norway Regional Health Authority
Ostfold Hospital Trust
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Regional Committees for Medical and Health Research Ethics (#13294, #11022)
7 min
Re-attendance in BreastScreen Norway after a false positive screening result
Marthe Larsen, Oslo / Norway
Author Block: M. Larsen, N. Moshina, J. Gjesvik, S. Sagstad, Å. S. Holen, M. B. Bergan, T. E. Nilsen, S. Hofvind; Oslo/NO
Purpose: Higher risk of breast cancer after a false positive versus a negative screening result has been reported. We aimed to compare re-attendance for women with a false positive versus negative screening result using more than 25 years of screening data.
Methods or Background: BreastScreen Norway invites women aged 50-69 to biennial screening. The study sample included 3 990 388 screening examinations from 921 309 women where an invitation to the subsequent screening round was available (eligible for re-attendance). Attendance in the subsequent screening round was analysed using mixed logistic regression with age at screening and screening history as covariates and screening outcome as exposure. Predicted probabilities (re-attendance) and 95% confidence intervals (CI) were calculated using average marginal effects.
Results or Findings: Having a false positive result after the prevalent screening examination resulted in a re-attendance rate of 88.3%. For women with a negative result, re-attendance was 90.3% after the prevalent examination. Having a false positive or negative result in the 9th screening round, gave a re-attendance rate of 89.0% and 91.1%, respectively.
Predicted re-attendance rate was 88.9% (95% CI: 88.9-89.0) after a false positive result and 88.1% (95% CI 88.0-88.3%) after a negative result. Using negative result, false positive without invasive procedure or false positive with invasive procedure as exposure variable, the predicted probabilities of re-attendance were 88.9% (95% CI: 88.9-89.0), 88.4% (95% CI: 88.2%-88.6%) and 87.6% (95% CI: 87.3%-88.0%), respectively.
Conclusion: Despite small differences in re-attendance after a false positive versus negative screening result, we consider the difference clinically important. Women should be informed about the importance of re-attending the screening programme after a false positive result.
Limitations: We do not have patient reported data on reasons for non-attendance.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Programme quality assurance is covered by the Cancer Registry Regulations.
7 min
The application of artificial intelligence to enhance the identification of previously missed non-palpable breast carcinomas
Yasmin Mohamed Nada, Cairo / Egypt
Author Block: S. A. Mansour, R. M. Kamal, S. Hussien, M. Emara, Y. Kassab, S. Taha, M. M. Gomaa, Y. M. Nada; Cairo/EG
Purpose: To investigate the impact of artificial intelligence (AI) on digital mammograms in increasing the chance of detection of missed breast cancer, study the early morphology indictors detected by AI and overlooked by the radiologist and correlate with the missed cancer pathological types.
Methods or Background: Screening and diagnostic mammograms (done in 2020-2023) presenting breast carcinomas (n = 1998) were analyzed in concordance with prior one-year-ago (2019-2022) assumed negative or benign) mammograms. Present mammograms were reviewed for the mammographic descriptors: asymmetry, distortion, mass, and microcalcifications. The AI analyzed mammograms and presented abnormalities by overlaying color hue and scoring percentage for the degree of suspicion of malignancy.
Results or Findings: Artificial intelligence detected 555 (54%) lesions in the prior mammograms, and in present mammograms (year 2020-2023) targeted 904 (88%) carcinomas. The descriptor proportion of asymmetry was the common presentation of missed breast carcinoma (n=356/555, 64.1%) in the prior mammograms and the AI highest detection rate presented by distortion (100%) followed by grouped microcalcifiactions (80%). AI performance to predict malignancy in previously assigned negative or benign mammograms showed a sensitivity of 73.4%, a specificity of 89%, and an accuracy of 78.4%.
Conclusion: Reading mammograms with artificial intelligence enhanced the detection of early cancerous changes. AI detection rate is not correlated with certain pathological types of breast cancer. Close follow-up is required for AI abnormality scoring of low values to minimize the potential for missed breast carcinoma.
Limitations: The study is being limited by the retrospective study design; and that it was a two institutional-based study, so multiple institutional-based studies are recommended.
Funding for this study: The study has no source of funding
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study has been ethically approved by the research center of the affiliated institute
7 min
AI-assisted Breast Mass Classification in Digital Breast Tomosynthesis (DBT): Applicability and insights from a single academic centre
Gaia Cura Curà, Vercelli / Italy
Author Block: G. Cura Curà1, G. Bartoli2, M. Costa2, E. Regini2, E. Puglisi2, F. Piccione2, F. Schettini2, M. Durando2, P. Fonio2; 1Vercelli/IT, 2Torino/IT
Purpose: In previous research, we trained a deep-learning model to classify benign and malignant masses identified on DBT images (convolutional neural network: efficientNetB0; dataset: 448 masses, size < 6 cm, 221 malignant, 227 benign; accuracy 94%, sensitivity 95.6%, specificity 91.7%). The aim of this study is to evaluate its applicability on breast mass lesions diagnostic work-up in clinical practice.
Methods or Background: In this single-centre multireader study, we prospectively collected DBT images from patients with biopsy-proven breast masses (size < 6 cm). For each case, masses were manually delineated with orthogonal axes on the best focused slice in both DBT standard views. A preliminary set of 64 DBTs (46 benign, 18 malignant) was reviewed by three independent dedicated breast radiologists with different experience, then assessed with the AI model. The software provides the benign/malignant classification combined with a prediction confidence score.
The response of the software was compared to biopsy results, focusing on BI-RADS classification, error rates, inter-reader agreement, and reading time.
Results or Findings: In 7% of cases, there was inter-reader disagreement on the software prediction. The model correctly classified 84% of masses, confirming the 92.8% of lesions categorized as BI-RADS 3 as benign. Software-assisted reading did not modify the reading time compared to conventional methods.
Conclusion: In these preliminary results, the highest agreement between radiologists and the AI model was observed with BI-RADS 3 lesions, highlighting the benefit of software-assisted characterization of benign masses. However, variability in inter-reader agreement on the software’s predictions limits its reliability in real practice. Further investigations and model refinement are necessary to improve the model robustness.
Limitations: Small sample size and single-centre study
Funding for this study: No funding was provided for this study
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Non applicable
7 min
Multi-site validation of an image-based AI breast cancer risk model for mammography to drive personalized screening after a negative screening
Andreas David Lauritzen, Copenhagen / Denmark
Author Block: A. D. Lauritzen1, A. Rodriguez-Ruiz2, N. Karssemeijer2, C. De Wolf3, R. Mann2, M. Nielsen1, I. Vejborg4, M. Lillholm1; 1Copenhagen/DK, 2Nijmegen/NL, 3Geneva/CH, 4Gentofte/DK
Purpose: To validate the performance of an image-based AI breast cancer risk model to stratify women attending screening after a negative screening.
Methods or Background: Exams from women attending two European screening programs (Denmark and Switzerland) and from a public U.S. database (EMBED) were consecutively sampled. All exams were screen-negative (cancer-free for 180 days) and had follow-up information of between two and six years. Mammography exams were processed by an AI breast cancer risk model (Transpara Risk, ScreenPoint Medical, trial version for research). The risk model computes three image biomarkers (suspicious findings, volumetric breast density, breast texture), and combined with age, it generates a five-year breast cancer risk score per exam. All exams were fully independent from the development of the risk model. Risk model AUCs were computed for each cohort along with sensitivity for women with the highest 10% risk and breast density, respectively.
Results or Findings: In total, 98,084 exams were included (31,349, 17,445, and 49,290 from Switzerland, US, and Denmark, respectively) with 1,336 breast cancers diagnosed within 5 years from screening. Images were acquired with machines from four manufacturers (Hologic, Siemens, GE, Philips). The AUCs of the AI risk model were 0.73 (95% CI: 0.69-0.76), 0.74 (95% CI: 0.69-0.79) and 0.74 (95% CI: 0.73-0.76) for Switzerland, US, and Denmark, respectively. When simulating using risk to offer supplemental imaging to 10% of women, after a negative screening, sensitivity was 37% (95% CI: 34%-39%), in comparison to 15% (95% CI: 13-17%) when using density alone.
Conclusion: An image-based AI breast cancer risk model shows high accuracy and robustness to stratify women attending screening according to risk and could support personalized screening with higher sensitivity than breast density.
Limitations: The retrospective study design is a limitation of this study.
Funding for this study: Supported in part by Eurostars (grant E9714 IBSCREEN)
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: The Danish Patient Safety Authority and Danish Data Protection Agency approved this retrospective study and the use of relevant Danish data, and waived the need for informed consent (ref. 3–3013–2118, addendum 2019/2023).
7 min
Replacing one radiologist with AI for independent double reading in mammographic screening
Marie Burns Bergan, Oslo / Norway
Author Block: M. B. Bergan, M. Larsen, J. Gjesvik, N. Moshina, S. Sagstad, T. Hovda, H. W. Koch, M. A. Martiniussen, S. Hofvind; Oslo/NO
Purpose: The aim of this study was to explore how replacing one radiologist with artificial intelligence (AI) for independent double reading in mammographic screening would affect cancer detection.
Methods or Background: This study sample consisted of 1,027,430 screening examinations, including 5786 screen-detected cancers, that were independently interpreted by two radiologists in BreastScreen Norway, 2004-2018. The radiologists scored each breast from 1, negative for abnormality, to 5, high suspicion of malignancy, and score ≥2 was considered positive. All examinations were processed by the AI system Lunit INSIGHT MMG version 1.1.7.2, assigning a continuous malignancy score from 0, no risk, to 100, very high risk. Cancer detection was presented for the combination of one radiologist and AI at various AI thresholds for positive examinations.
Results or Findings: Of all screen-detected cancers, 86.9% (5028/5786) were classified as positive (score ≥2) by one radiologist. When defining 10% of the examinations with the highest AI score as positive by AI, 79.9% (4622/5786) of the screen-detected cancers and 7.5% (134/1783) of the interval cancers would be detected. When 5% with the highest AI scores were considered positive, 75.5% (4348/5786) of the screen-detected and 5.7% (102/1783) of the interval cancers would be detected. In a scenario where 1% of the examinations were classified as positive by AI, 58.2% (3369/5786) of the screen-detected and 2.4% (42/1783) of the interval cancers would be detected.
Conclusion: At an AI threshold of 5%, replacing one of the radiologists with AI in independent double reading of screening mammograms will reduce the reading volume by 50% at the cost of missing 24.5% of screen-detected cancers, but with the possibility of detecting 5.7% of the interval cancers.
Limitations: We assume that all cancers classified as positive by the radiologist and AI were detected.
Funding for this study: Funding was provided by the Norwegian Cancer Society (Pink Ribbon)
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Regional Committees for Medical and Health Research Ethics (#2018/2574).

Notice

This session will not be streamed, nor will it be available on-demand!

CME Information

This session is accredited with 1.5 CME credits.

Moderators

  • Isabelle Thomassin-Naggara

    Paris / France

Speakers

  • Mikael Eriksson

    Stockholm / Sweden
  • Roxana Maria Pintican

    Cluj-Napoca / Romania
  • Esperanza Elías Cabot

    Córdoba / Spain
  • Christophorus De Wolf

    Onex / Switzerland
  • Sneha Singh

    Dublin / Ireland
  • Jessie Gommers

    Nijmegen / Netherlands
  • Marit Almenning Martiniussen

    Graalum / Norway
  • Marthe Larsen

    Oslo / Norway
  • Yasmin Mohamed Nada

    Cairo / Egypt
  • Gaia Cura Curà

    Vercelli / Italy