Research Presentation Session: Breast

RPS 1102 - Revolutionising breast imaging with artificial intelligence

March 5, 16:30 - 18:00 CET

6 min
The effect of AI on retrospectively visible interval cancers in mammography screening – results from the randomised controlled MASAI trial
Veronica Hernström, Lund / Sweden
Author Block: V. Hernström, H. Sartor, O. Hagberg, K. Lang; Lund/SE
Purpose: To investigate whether the distribution of interval cancer classification groups differ with or without the use of artificial intelligence (AI) in mammography screening.
Methods or Background: The MASAI trial is a randomised, controlled, non-inferiority, screening study comparing AI-supported mammography screening with standard double reading. In the intervention group, AI was used for triage to single or double reading and for detection support. Participants with interval cancers were retrospectively reviewed by a panel of breast radiologists and classified as true negative, showing minimal signs or missed at screening. The distribution of classification groups was compared between the intervention and control groups, and in relation to AI risk scores. Differences in distribution were assessed with a Chi-2 test.
Results or Findings: There were fewer interval cancers showing minimal signs (14 [17%] vs 26 [28%]) and a similar number of missed (9 [11%] vs 10 [11%]) and true negatives (59 [72%] vs 57 [61%]), in the intervention group compared to the control group. The distribution was however not statistically significantly different (p=0.22). Of the retrospectively visible interval cancers (missed or minimal signs) in the intervention group, 78% (18/23) had intermediate or high AI risk scores and 65% (15/23) were correctly localised by AI.
Conclusion: The use of AI in mammography screening yielded fewer interval cancers with minimal signs at screening compared with standard double reading, indicating its ability to aid in detecting subtle malignancies. A further reduction of interval cancer may be achievable since a substantial proportion of the retrospectively visible interval cancers were assigned elevated risk scores and were correctly localised by AI.
Limitations: Single-institution trial, single AI-system, informed review of classification groups
Funding for this study: Swedish Cancer Society, Confederation of Regional Cancer Centres, and governmental funding for clinical research.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Swedish ethical review authority, number NCT04838756.
6 min
Influence of AI-informed Disease Prevalence on Radiologist Performance: Insights from the Mammography Screening with Artificial Intelligence trial (MASAI)
Jessie Gommers, Nijmegen / Netherlands
Author Block: J. Gommers1, C. Abbey2, O. Hagberg3, K. Lang3; 1Nijmegen/NL, 2Santa Barbara, CA/US, 3Malmö/SE
Purpose: To investigate how knowledge of AI-informed disease prevalence influenced radiologists’ screening performance by comparing outcomes across AI risk scores when reading with and without AI.
Methods or Background: This study used data from the prospective MASAI trial, designed to compare AI-supported mammography screening, where AI triaged examinations to single (AI scores 1-9) or double (AI score 10) reading and was used as decision support, with standard double reading without AI. Recall rates (RR), cancer detection rates (CDR), and false positive rates (FPR) were compared between the AI-supported group (intervention) and double reading group (control) using Fisher’s exact tests with results stratified by AI risk scores. Single-reader assessments in the control group were approximated for examinations with AI scores 1-9.
Results or Findings: In total, 105,087 women were included in this study, of which 52,576 in the control group and 52,511 in the intervention group. For examinations classified as low risk (AI score 1–7), AI-supported reading led to a reduction in RR (0.50% vs 0.61%, P=.043) and FPR (0.49% vs 0.59%, P=.049 ), without affecting CDR (0.17‰ vs 0.22‰, P=.804), compared to standard reading. For intermediate-risk examinations (AI scores 8–9), AI-supported reading resulted in increased RR (2.29% vs 1.58%, P<.001) and FPR (1.90% vs 1.30%, P=.003), with no change in CDR (3.94‰ vs 2.75‰, P=.224). For high-suspicion examinations (AI score 10), AI-supported reading increased RR (14.41% vs 9.44%, P<.001), FPR (6.69% vs 3.40%, P<.001), and CDR (77.23‰ vs 60.34‰, P=.004).
Conclusion: Knowledge about AI-informed prevalence affects radiologists differently across AI risk categories, enhancing cancer detection in high-risk examinations and reducing false positives in low-risk examinations.
Limitations: Interval cancer data not yet included.
Funding for this study: Swedish Cancer Society, Confederation of Regional Cancer Centres, and Swedish governmental funding for clinical research.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The Swedish Ethical Review Authority approved the study (2020-04936, 2023-026848-02) and waived the need for obtaining written informed consent.
6 min
Artificial Intelligence in Mammography Screening in Norway (AIMS Norway): a randomized controlled trial
Solveig Hofvind, Oslo / Norway
Author Block: Å. S. Holen, M. Larsen, S. Hofvind; Oslo/NO
Purpose: The randomized controlled trial Artificial Intelligence in Mammography Screening in Norway (AIMS Norway) evaluates the performance of artificial intelligence (AI) combined with one or two radiologists, compared to standard independent double reading in BreastScreen Norway. The objective is to demonstrate that AI-assisted screening is non-inferior to the current standard in detecting breast cancer.
Methods or Background: This randomized, controlled, non-inferiority, parallel-group, single-blind trial is recruiting participants through written consent at the time of mammography screening. Participants are randomized into either a study or control group. In the study group, examinations are analyzed using Transpara® v2.1 (ScreenPoint Medical) and triaged by AI score, indicating risk of malignancy. Exams with low scores (1–7) are single-read, while those with intermediate to high scores (8–10) undergo independent double reading (standard of care). Radiologists are blinded to both the AI results and whether the exam is single- or double-read during primary reading; AI scores and AI-generated annotations are only available during consensus meetings. The control group follows standard double reading.
Results or Findings: Recruitment for the trial began in November 2024 in Western Norway and in September 2025 in the Central region. Recruitment in Northern Norway is planned to start by the end of 2025. The trial aims to enroll 140,000 women. As of October 2025, 79% of women attending the screening program in participating regions have consented participation in the study.
Conclusion: The AIMS Norway randomized controlled trial has begun recruiting participants in two health regions, with a 79% acceptance rate. Recruitment will continue until the target number of participants is reached to demonstrate non-inferiority in screen-detected cancer rates.
Limitations: None
Funding for this study: Norwegian Cancer Society (Pink Ribbon, #214931) and the Western (#F-12858-D11417), Central (#2024-36740) and Northern (#HNF1723-24) Norway Regional Health Authorities.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Regional Ethical Committee in South East Norway.
6 min
Improving the performance of AI on Australian screening mammograms using priors cases
Sarah Jayne Lewis, Penrith / Australia
Author Block: S. J. Lewis1, P. D. Trieu2, S. Tavakoli Taba2, M. Barron2, Z. (. Jiang2; 1Campbelltown/AU, 2The University Of Sydney/AU
Purpose: Mammographic population-based screening programs can improve the early detection of breast cancer, but workload pressures for radiologists remain. Artificial Intelligence (AI) may mitigate this pressure but the lack of training with prior cases, which research shows is important for high specificity, is a limitation and lacks rigorous documentation. This study compares the performance of an AI trained on mammograms with, and without priors.
Methods or Background: The training dataset had 1458 cases, 729 malignant, 729 normal) all with prior mammograms from a previous screening round. Current and prior mammograms were aligned using a fully convolutional network with a multiresolution strategy. The Globally-aware Multiple Instance Classifier (GMIC) AI with transfer learning was applied to generate malignant features. A correlational neural network with 6-fold cross-validation learned the relationships between two patches corresponding to the current/prior malignant features. AI performance testing was assessed using the BreastScreen Reader Assessment Strategy dataset (374 cases, 118 malignant, 256 normal) across different cancer types with, and without priors, using one-way ANOVA test.
Results or Findings: The AI demonstrated a significant improvement in specificity (93.1% with priors; 90.4% without; p =0.037). Significant improvement in sensitivity for spiculated masses (p=0.024) and architectural distortions (p=0.012) was shown with a non-significant difference in overall sensitivity (92.6% vs 91.2%) and for calcification (p=0.26), discrete mass (p=0.19), non-specific density (p=0.08), and stellate (p=0.13) lesions.
Conclusion: AI performance was improved through training with prior screening cases, notably through specificity, and the detection of two cancer types. The importance of training with longitudinal screening rounds should be considered by AI vendors as well as a national approach to screening program data storage to advance AI technologies.
Limitations: Australian screening context with small training and testing datasets.
Funding for this study: National Breast Cancer Foundation (Australia) and the Australian Commonwealth Department of Health and Aging.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The University of Sydney
6 min
AI-only for assessing breast cancer screening mammograms – the evolution with AI as an independent reader
Karin Elisabeth Dembrower, Stockholm / Sweden
Author Block: K. E. Dembrower1, F. Pilblad1, M. Arango Lievano2; 1Stockholm/SE, 2Montpellier/FR
Purpose: Since June 2023 an AI-algorithm replaced one human reader in a population based screening program for breast cancer, redefining the standard of care for this center as AI and one radiologist instead of two radiolgists. This study aims at assessing the safety of an AI only pathway for highly confidence normal examinations.
Methods or Background: If either AI or the human reader flagged an examination, it was referred to the consensus discussion where two breast radiologists finally decide to declare the woman as healthy or recall her for further work-up. The AI-algorithm generated an abnormality score (AS) from zero to one hundred where zero is no abnormality detected and one hundred is most likely cancer. Images were generated from equipment by one manufacturer. We reported which cancer cases were flagged by a human reader, by AI and if the examination was flagged because of clinical symptoms (and/or/not AI-positive). The ground truth was pathologically proven breast cancer. We analyzed the AS of all examinations to determine a safety threshold where no SDC was missed by AI.
Results or Findings: All screening mammograms between 15/01/2025 and 31/08/2025 were assessed independently by AI and one human reader. In total 25 172 women were screened. And total of 158 cancer cases were diagnosed with a score between 3.95 and 99.2. The lowest AI-score for a cancer case was set 3,95. If we set the safety threshold at AS=2, the proportion triaged images is 17,2% (4 330 examinations).
Conclusion: These results demonstrate that assessment of low score screening examinations (AI score below two) with AI only is safe for almost 20% of the screening examinations, further decreasing radiolgists’ workload.
Limitations: Not applicable
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Artificial intelligence reveals early mammographic signs of breast cancers diagnosed at subsequent screening
Eleonora Di Gaetano, Treviso / Italy
Author Block: E. Di Gaetano, C. M. Weiss, R. Cerniato, E. Cattarin, G. Soppelsa, I. Vinci, G. Morana; Treviso/IT
Purpose: Artificial intelligence (AI) generates continuous risk scores (RS) and identifies regions of interest (ROI) on screening mammograms (SM). We evaluated whether baseline RS and ROI in an AI-supported screening (AISS) could predict breast cancers (BC) diagnosed at the subsequent round, and whether results were consistent across BI-RADS breast density (BD) and lesion types.
Methods or Background: We retrospectively analysed 321 women diagnosed with BC at the subsequent AISS round (November 2023–July 2025), performed on average after 777 (median 768;range 691–1,192) days following a negative SM. Side-specific RS at baseline and subsequent round were compared using the Wilcoxon signed-rank test. ROI were deemed concordant if they overlapped the tumour in at least one projection. McNemar’s test assessed within-subject ROI changes, and conditional odds ratios (OR) with 95% confidence intervals (CI) were calculated. Analyses were stratified by BD and lesion type.
Results or Findings: Baseline RS was significantly higher in breasts that developed BC than in cancer-free sides (16.9vs10.1;p<0.001), and increased further at the subsequent round (47.6vs10.2;p<0.001). Concordant ROI were present at baseline in 111/321 BCs (34.6%). Mean baseline RS was similar for opacities (15.9), microcalcifications (15.8), and distortions (19.6); combined lesions had higher values (20.4). Baseline ROI concordance was found in 31.3% (50/160) of opacities, 28.8% (21/73) of microcalcifications, 40.0% (18/45) of distortions, and 51.2% (22/43) of combined lesions. When grouped by BD, concordant ROI were more frequent in BD C–D (46/122;37.7%) than in A–B (65/199;32.7%).
Conclusion: AI assigned higher RS to the breast that later developed cancer and localized one third of BCs at baseline with concordant ROI, especially in combined lesions. These findings highlight the potential of AI to anticipate BC detection in population-based screening.
Limitations: No limitations
Funding for this study: No Funding for this study
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
AI risk score for a new versus old version of a CE-marked AI model for breast cancer detection
Solveig Hofvind, Oslo / Norway
Author Block: M. Larsen, Å. S. Holen, N. Moshina, M. B. Bergan, S. Hofvind; Oslo/NO
Purpose: To explore screening examinations with and without breast cancer with the same screening mammograms, but with higher AI risk score in the new versus old version of an AI model.
Methods or Background: This retrospective cohort study used data from 117,709 screening examinations performed in BreastScreen Norway 2009-2018. The same screening mammograms were processed by two versions of the commercially available AI model, Transpara (version 1.7 and 2.1, ScreenPoint Medical). We used the categorical AI risk score assessment, AI score 1-10.
Scores between 1-7 were defined as low risk, 8-9 as intermediate risk, and 10 as high risk of malignancy. Changes in risk scores below refers to the low, intermediate and high-risk groups.
Results or Findings: A total of 5.3% (6272/117,709) of all screening examinations had higher AI score in version 2.1 as in 1.7. Among the screen-detected cancer cases, 7.9% (58/737) had higher AI score in the new versus old version, while 85.6% (631/737) had stable high AI risk score. We found 1.5% (11/737) to have a lower (10 -> 1-9) and 5.0% (37/737) to have stable low or stable intermediate score (1-9 -> 1-9). Among the interval cancers, 11.5% (23/200) had higher score in the new versus old version.
Conclusion: An updated version of the AI model resulted in a higher detection of cancers in the high AI risk score group. However, the total number of examinations in the high-risk group did also increase, which resulted in a higher proportion of high AI score, but no cancer detected.
Limitations: Exploring location of AI markings and comparison with true cancer location were not included in this study.
Funding for this study: The Norwegian Cancer Society and the Pink Ribbon Campaign have supported performance of the study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approved by the Regional Ethical Committee in South Eastern Norway
6 min
Empirically determined effect of dataset size and enrichment on threshold selection of a commercial mammography AI algorithm
Nicholas Payne, Cambridge / United Kingdom
Author Block: N. Payne1, J. Rothwell1, R. Black1, S. Hickman2, F. Kilburn-Toppin1, B. Kasmai3, A. Juette3, F. J. Gilbert1; 1Cambridge/UK, 2London/UK, 3Norwich/UK
Purpose: To empirically show the varying precision and accuracy of selecting AI operating thresholds based on dataset size and composition in prospective use.
Methods or Background: 22,608 full-field digital consecutive mammograms from a single site and vendor (including 952 screen detected cancers (SDC) and 471 interval cancers (IC)) were scored by a commercial AI tool (Lunit INSIGHT-MMGv1.1.7) to establish ‘true’ performance. Nine data subsets were defined by degree of cancer enrichment (representative [8/1000 SDC, 4/1000 IC], mildly enriched [20/1000 SDC, 10/1000 IC], and heavily enriched [100/1000 SDC, 50/1000 IC]) and size (small [n=250], medium [n=1000], large [n=5000]). Each subset was used to measure area under the receiver operating characteristic curve (AUC) and to set thresholds at 90% specificity, 70% sensitivity, and 5% recall which were then applied to the full data to find their ‘actual’ performance. Subsampling was repeated 1000 times.
Results or Findings: ‘True’ AUC was 0.899. Using larger and enriched subsets increased the likelihood of an ‘accurate’ measure of AUC. Enrichment was beneficial for threshold setting based on sensitivity, however, even using large heavily enriched subsets, the 'actual' sensitivity ranged by >5% points. There is a detrimental impact of enrichment when threshold setting based on recall rate, as oversampling of cancer cases leads to a reduced recall rate when applied to the population. Mean recall rate of thresholds set using the representative, mildly-, and heavily enriched subsets were ~5%, ~4% and ~0.5% respectively.
Conclusion: When selecting a dataset to evaluate AI tools and set operating thresholds, larger datasets are beneficial provided they are relevant to the metric used. Enrichment allows smaller datasets to better assess AUC and sensitivity, however, it is vital to use representative datasets when assessing recall rates.
Limitations: Single site and single vendor.
Funding for this study: Funding was provided by the Future Dreams Breast Cancer charity, the National Institute for Health and Care Research (NIHR) Cambridge Biomedical Research Centre (NIHR203312) and the Cancer Research UK early detection program grant (C543/A26884). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study used the Cambridge Cohort Mammography East Anglia Digital Imaging Archive (‘CC-MEDIA’) dataset, which has ethical approval in which informed consent is waived (Health Research Authority Research Ethics Committee 25/LO/0220, Health Research Authority Confidentiality Advisory Group 20/CAG/0009, Public Health England Research Advisory Committee BSPRAC_090).
6 min
A Fully Automated Deep Learning Framework for Lesion Segmentation and Survival Prediction in Breast Cancer Patients
Shushan Dong, Shanghai / China
Author Block: K. Wang1, S. Wang1, S. Huang2, J. Xie3, S. Dong2, M. Xu1, R. Zhang1; 1Hangzhou/CN, 2Beijing/CN, 3Shanghai/CN
Purpose: To develop and validate a fully automated deep learning framework (FA-SurvNet) that integrates lesion segmentation and survival prediction for breast cancer patients using preoperative MRI.
Methods or Background: In this retrospective study, 573 female breast cancer patients from two medical centers who underwent preoperative MRI were enrolled. We developed the FA-SurvNet model, which integrates an nnU-Net for automated tumor segmentation with a CNN-Cox regression network for survival analysis. The segmentation performance was evaluated using the Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (SEN). The prognostic performance of FA-SurvNet was compared against two baseline models: a clinical Traditional Chinese Medicine (TCM) model and a radiomics model, both based on conventional Cox regression. Model evaluation included Harrell's concordance index (C-index), time-dependent area under the curve (AUC), and decision curve analysis (DCA) for net clinical benefit.
Results or Findings: The nnU-Net segmentation model achieved high performance in the training (DSC: 0.85, PPV: 0.85, SEN: 0.88) and testing cohorts (DSC: 0.84, PPV: 0.87, SEN: 0.84). The FA-SurvNet model demonstrated excellent predictive ability for recurrence risk, with C-indexes of 0.88 (training) and 0.84 (testing). It outperformed the radiomics model (training C-index: 0.84; testing C-index: 0.75) and performed comparably to the clinical TCM model (training C-index: 0.89; testing C-index: 0.88). The time-dependent AUCs of FA-SurvNet for predicting 3- and 5-year recurrence-free survival (RFS) were 0.89 and 0.90 in the training cohort, and 0.84 and 0.87 in the testing cohort, respectively.
Conclusion: The FA-SurvNet framework successfully automates the entire prognostic pipeline from lesion segmentation to survival risk estimation, offering a powerful and efficient tool for personalizing breast cancer management.
Limitations: None
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: None
6 min
Artificial intelligence–assisted risk stratification of atypical breast lesions: correlation of pathology, imaging features and Lunit abnormality score
Rafael Boava Souza, São Paulo / Brazil
Author Block: R. B. Souza, L. F. Chala, G. G. N. Mello, T. C. d. M. Tucunduva, M. Siepcich, B. A. Rocha, I. Oliveria, M. P. F. Ananias, V. L. N. Aguillar; São Paulo/BR
Purpose: Atypical breast lesions (B3), including atypical ductal hyperplasia (ADH), lobular neoplasia (LN), and flat epithelial atypia (FEA), diagnosed by stereotactic vacuum-assisted biopsy (VAB) of suspicious calcifications carry variable risk of underestimation and upgrade to ductal carcinoma in situ (DCIS) or invasive carcinoma. This study evaluated upgrade rates of B3 lesions, predictive factors, and the role of Lunit INSIGHT MMG abnormality score as an AI-based biomarker.
Methods or Background: This retrospective study included 271 women (Jan 2020–Dec 2024) with VAB-diagnosed B3; 157 underwent surgery and 114 clinical-radiological follow-up. Analyzed variables included age, personal/family history, calcification morphology and distribution, residual calcifications, and pathological subtype. Lunit scores were retrieved for 100 cases. Analyses included chi-square, Fisher’s exact, Mann-Whitney U, and ROC.
Results or Findings: Overall upgrade rate was 16.6% (26/157): 73% DCIS, 27% invasive carcinoma. Pathology was the strongest predictor. Borderline ADH demonstrated the highest upgrade risk (77.8%, p<0.001), followed by ADH (21.9%, p=0.012), FEA (13%), and LN (10%). No difference was observed between isolated and multiple atypias (p=0.801), both lower than borderline ADH. Pleomorphic linear/branching calcifications were significantly associated with malignancy (p=0.007). Residual calcifications showed a non-significant trend toward higher risk (22% vs. 11.4%). Clinical variables were not predictive. Lunit scores were higher in upgraded vs non-upgraded cases (mean 36.2 vs 23.2; median 32.2 vs 11.9), though not statistically significant (p=0.149). ROC analysis yielded an AUC of 0.615, with optimal cut-off of approximately 29 (sensitivity 62.5%, specificity 67.9%).
Conclusion: Borderline ADH is the strongest predictor of upgrade, followed by suspicious calcification morphology. Although not statistically significant, higher Lunit scores trended toward upgraded lesions and moderate discriminatory performance. AI-assisted mammographic analysis may complement pathology and imaging in risk models, supporting individualized management of atypical calcifications.
Limitations: No limitations were identified.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Research Ethics Committee of Grupo Fleury.
6 min
Real versus Virtual Contrast Enhancement for AI-Based Breast MRI Classification
Tri-Thien Nguyen, Erlangen / Germany
Author Block: T-T. Nguyen, S. Heidarikahkesh, H. Schreiter, L. Brock, L. A. Kapsner, A. Liebert, A. Maier, M. Uder, S. Bickelhaupt; Erlangen/DE
Purpose: Breast MRI offers high sensitivity and specificity; however, widespread use remains limited. Contrast-free protocols and AI tools might improve accessibility, for example in screening. This study evaluated the diagnostic performance of an AI model trained on early contrast-enhanced subtraction (T1sub) images for breast cancer classification, and tested its application on both real contrast-enhanced T1sub and virtually generated contrast-enhanced (T1virtual) images from non-contrast sequences.
Methods or Background: This IRB-approved retrospective study included 1,712 women undergoing routine multiparametric 3T breast MRI, some with multiple exams. Images were split by breast side, yielding 3,685 samples, including 488 test cases. A Medical Slice Transformer was trained on T1sub images to classify BI-RADS 1–3 versus 4–6. Model performance was evaluated on both T1sub and T1virtual test images. Statistics used McNemar’s test for accuracy and DeLong’s test for AUC; specificity at 90%, 95%, and 97.5% sensitivity was reported. Attention map analysis assessed whether the model focused on clinically relevant regions, using a three-point Likert scale (good, moderate, poor) rated by a board-certified radiologist.
Results or Findings: The model achieved higher performance on T1sub compared with T1virtual images (AUC 0.80±0.03 vs 0.74±0.03, p=0.035; accuracy 0.85 vs 0.81, p=0.0056). At sensitivity thresholds of 90%, 95%, and 97.5%, specificity was 0.33/0.21/0.12 on T1sub and 0.21/0.13/0.09 on T1virtual. Attention maps showed similar lesion focus, with most cases rated ‘Good’ (55–60%) or ‘Moderate’ (30–35%).
Conclusion: An AI model trained on contrast-enhanced T1sub achieved 4% higher accuracy on T1sub than T1virtual, a statistically significant difference. While performance was lower on virtual contrast, the approach remains promising and warrants further validation to expand accessibility of AI-assisted breast cancer detection, particularly in screening scenarios.
Limitations: Single-center retrospective study; model trained exclusively on contrast-enhanced data and applied without re-training to virtual images.
Funding for this study: This project is partially funded by the Bavarian Ministry of Economic Affairs, Regional Development and Energy
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Institutional Review Board of the Friedrich-Alexander-Universität Erlangen-Nürnberg (Ethik-Kommission) with waived informed consent (approval number 23-281-Br) due to its retrospective nature and use of pseudonymized data.
6 min
Can artificial intelligence detect additional cancers on screening digital breast tomosynthesis?
Victor Carl Martin Dahlblom, Tygelsjö / Sweden
Author Block: V. C. M. Dahlblom, K. Johnson, M. Dustler, A. Tingberg, S. Zackrisson; Malmö/SE
Purpose: Several studies have indicated that AI systems can detect additional cancers on screening digital mammography (DM) examinations compared with unaided radiologists, but it is less studied if this also applies for digital breast tomosynthesis (DBT). We want to investigate if a DBT AI system can detect cancers that were not detected in screening.
Methods or Background: The study is based on the Malmö Breast Tomosynthesis Screening Trial, where women were screened with double-read DBT once. Following screenings used DM only. DBT examinations from 14368 women were analysed with ScreenPoint Transpara 2.1, which gives a score between 1 and 10, where 10 means the highest cancer suspicion. The AI scores from the DBT screening examinations of women with screening-detected cancers in the two upcoming screening rounds (1.5 or 2 years interval depending on age) – or diagnosed with interval cancers, were compared with scores from women without breast cancer.
Results or Findings: The mean AI score at screening DBT was clearly higher for women with cancer diagnosed in the interval until (mean 6.1) or at the first following screening round (mean 5.9), compared to women without cancer diagnosed during the two subsequent screening rounds (mean 3.3). During the first interval and screening round, 23% (6/22) of the interval cancers and 26% (15/57) of the screening-detected cancers had score 10 at DBT screening, compared to 3.8% (527/13986) among women without cancer. For cancers in the second following screening round, the DBT examination had score 10 in 11% (3/27) of interval cancers and 10% (6/56) of screening-detected cancers.
Conclusion: AI analysis of DBT screening images could potentially detect a substantial amount of additional cancers compared to radiologist double reading of DBT.
Limitations: Single-centre, single-vendor, single-view DBT at one screening occasion.
Funding for this study: Skåne University Hospital
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Covered by the ethical approval for the MBTST (official records number: 2009/770)
6 min
Impact of Annotation-Free AI system for Simultaneous Detection and Diagnosis on Breast Ultrasound: A Multi-Reader Multi-Case Study across Diverse Professionals
Hasang Park, Seoul, Republic of Korea / Korea, Republic of
Author Block: H. Park1, K. H. Han2, H-W. Kim3, W. H. Kim4, J. Kim4, J. H. Yoon1; 1Seoul, Republic of Korea/KR, 2Seoul/KR, 3iksan/KR, 4Daegu/KR
Purpose: To evaluate how a deep learning-based artificial intelligence(AI) system that provides autonomous lesion detection and simultaneous differential diagnosis for breast ultrasound(US) affects the diagnostic performance of readers with diverse professional backgrounds.
Methods or Background: This study included 1,000 US images (500 cancer, 250 benign, 250 negative). Fifteen readers of various medical profession were recruited: 6 radiologists, 3 breast surgeons, 3 physicians, and 3 radiographers. Image interpretation was conducted in two sessions; session 1: without AI, and after 2-weeks washout, session 2: with AI. Reader performances were evaluated and compared between the two sessions.
Results or Findings: Overall reader-averaged area under the localization receiver operating characteristic curve(LROC) significantly increased in session 2 vs. session 1 [0.910 (95% CI: 0.888, 0.931) vs 0.864 (95% CI: 0.831, 0.898)] (P=0.002). LROC of standalone AI was 0.909 (95% CI: 0.889, 0.930). Of subgroups according to profession, radiographers showed significantly improved LROC in session 2 vs. session 1, 0.922 vs 0.868, (P=0.032). Average sensitivity and accuracy were significantly improved for all 15 readers in session 2 vs. session 1; 95.0% vs. 85.3% and 85.7% vs. 82.3%, respectively (all P<0.001). Physicians (82.3% vs. 75.7%) and radiographers (81.7% vs. 76.7%) demonstrated significant decrease of specificity in session 2 vs. session 1 (all P<0.001), respectively. Radiologists without fellowship training or <8 years of experience, showed significant improvement in sensitivity and accuracy but decreased specificity in session 2 (all P<0.05), respectively.
Conclusion: Using AI for breast US interpretation significantly enhanced the overall diagnostic performances in readers of diverse healthcare professionals. AI application may result in different consequences across readers of different levels of expertise, that should be considered in clinical application.
Limitations: Cancer enriched population was used for reader study.
Funding for this study: This study was supported by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711197554, RS-2023-00227526).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study has been approved by the institutional review board (IRB) of three institutions (Severance Hospital, IRB No. 1-2023-0073, Wonkwang University Hospital IRB No. 2023-07-017-002, Kyungpook National University Chilgok Hospital IRB No. KNUCH 2023-11-029), with a waiver for informed consent.
6 min
Application of an artificial intelligence-assisted diagnostic system for breast ultrasound: a prospective study
Zhi Li Wang, Beijing / China
Author Block: Z. L. Wang; Beijing/CN
Purpose: Accurate diagnosis of breast cancer is of great importance to improve the prognosis of patients. AI-assisted diagnostic system for breast ultrasound is gradually being applied in the identification of benign and malignant breast lesions. This study aimed to evaluate the diagnostic performance and optimal application of AI-assisted ultrasonography for breast lesions in clinical setting.
Methods or Background: A total of 501 consecutive patients with 679 breast lesions were prospectively included in the study. Junior and senior radiologists were asked to interpret images of lesions with and without AI assistance, respectively. Three application modes of AI were employed: AI alone, adjusted BI-RADS, and second reading mode. The diagnostic performances of these application modes were analyzed and compared.
Results or Findings: The AUC of junior radiologists increased from 0.879 to 0.921 in BI-RADSsecond reading, which was higher than that in BI-RADSadjusted (0.901), similar to that in AI alone (0.924), and lower than that obtained by senior radiologists (0.950). Using BI-RADS category 4A as the threshold, the sensitivity of junior radiologists was found to increase from 0.83 to 0.92 (P<0.001). Furthermore, the specificity increased from 0.79 to 0.85, which was higher than those of AI alone and BI-RADSadjusted (P<0.001). The unnecessary biopsy rate decreased by 14.70% (P=0.01). For senior radiologists, the sensitivity increased from 0.91 to 0.96 (P=0.01). Similar results were observed in the subgroup analysis of lesions ≤2 cm. For lesions >2 cm, only the specificity of junior radiologists increased from 0.39 to 0.52 (P=0.03).
Conclusion: AI-assisted ultrasound is useful for the diagnosis of breast lesions, particularly for junior radiologists and lesions ≤2 cm. The use of the second reading mode can achieve excellent diagnostic performance.
Limitations: The sample size of breast lesions >2 cm was relatively small.
Funding for this study: This work was supported by the National Natural Science Foundation of China (No. 82071925), the Military Health Project (No. 22BJZ23), and the Equipment Comprehensive Research Project (No. LB20211A010011).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This prospective study was approved by the Ethics Committee of the Chinese PLA General Hospital (No. S2021-683-01), and informed
consent was obtained from all patients.