Research Presentation Session: Imaging Informatics and Artificial Intelligence

RPS 705 - Artificial intelligence in breast imaging

February 27, 08:00 - 09:30 CET

  • ACV - Research Stage 3
  • ECR 2025
  • 12 Lectures
  • 90 Minutes
  • 12 Speakers

Description

7 min
Optimal utilization of an AI diagnostic software in a mammography screening program in Switzerland
Marcel Blum, St.Gallen / Switzerland
Author Block: M. Blum, A. Geissler, D. Ehlig, J. Vogel, J. Subelack, R. Morant; St.Gallen/CH
Purpose: The goal of this study is to evaluate Profound AI® (pAI) in the screening process of the organized mammography screening program (MSP) “donna”. We aim to identify the optimal utilization of pAI in the MSP regarding its effectiveness (sensitivity and specificity) and its influence on required resources.
Methods or Background: In this retrospective study, we analyse all mammographies from one screening round, i.e., the years of 2022 and 2023, of the MSP “donna” in the Swiss canton of St.Gallen (approximately 27,600 mammographies) using pAI by iCAD, which will assign each mammography a case and predictive risk score. We use optimization models, such as the receiver operating characteristics curve, to find the optimal threshold for case discussion in a consensus conference. We simulate multiple AI implementation scenarios within the MSP, including AI as a substitute for one of the two radiologists and AI as a preselection tool to identify mammographies for double reading.
Results or Findings: First results of this study are expected in early 2025 with anticipation to determine an optimal threshold when a mammography should be further discussed in a consensus conference. This threshold is expected to increase the effectiveness by increasing the breast cancer detection rate. In the simulated scenarios, we expect that the workload of radiologists can be reduced significantly, thus increasing the efficiency of the MSP, without loss of effectiveness.
Conclusion: Our study will contribute to identifying the optimal implementation of AI in the screening process of an MSP, optimize its effectiveness, i.e., increasing the cancer detection rate, and its efficiency, as well as initiate a discussion about the future of organized screening.
Limitations: This study’s limitation lies in its retrospective design and the initial omission of interval carcinomas.
Funding for this study: This study is partly funded by the Cancer League of Eastern Switzerland.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study has been approved by the Ethics Committee of Eastern Switzerland (EKOS) under the project ID 2024-01310.
7 min
Artificial intelligence mammography interpretation systems are affected more by mammographic image quality issues than radiologists are
Sarah Delaja Verboom, Nijmegen / Netherlands
Author Block: S. D. Verboom, J. M. D. S. Boita, M. Broeders, I. Sechopoulos; Nijmegen/NL
Purpose: To determine how common image quality issues in mammograms affect the performance of artificial intelligence (AI)-based mammography interpretation systems compared to expert breast radiologists.
Methods or Background: Five common image quality issues were simulated on 80 digital screening mammograms (40:20:20, cancer:benign:normal). Each issue was simulated at two levels: the lowest quality that was acceptable to radiologists, and a realistic quality that was not acceptable.

Thirteen expert breast radiologists from five countries and two commercial AI systems assessed all mammograms and scored the mammograms with a probability of malignancy (PoM) and a recall decision. The AI recall decision was obtained by matching the specificity on standard quality images to that of the radiologists. The area under the receiver operating characteristics curve (AUC) and recall decisions of radiologists and AI for the two lower quality levels were compared to those for the standard quality images.
Results or Findings: The radiologists’ original mean AUC of 0.76 (95%CI 0.68-0.84) was not affected by the lower image quality (p=0.77, 0.46). The AUCs of AI system A were 0.72 (0.60-0.83) on the original quality, 0.68 (0.55-0.80) (p=0.47) for the lower-acceptable quality, and 0.61 (0.49-0.74) (p=0.06) for the unacceptable quality. For system B, the AUC decreased from 0.95 (0.90-1.0) to 0.91 (0.84-0.96) (p=0.25) and to 0.87 (0.78-0.95) (p=0.02), respectively.

Radiologists gave the same recall decision in 83% and 82% of the cases for each quality level. Meanwhile, system A gave the same recall decision in 75% (p=0.06) and 68% (p=0.001) of the cases and system B in 80% (p=0.47) and 78% (p=0.27) of the cases.
Conclusion: Image quality can affect AI performance and recall decision more than radiologists’, even when radiologists’ performance is not affected.
Limitations: Retrospective study with limited sample size.
Funding for this study: aiREAD financed by the Dutch Research Council (NWO), Dutch Cancer Society (KWF), Health Holland (HH).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Approval of the etics committe was not applicable due to the retrospective nature of this study with anonymized data that was previously approved for retrospective use.
7 min
Evaluation of a Digital Breast Tomosynthesis Cancer Detection AI Algorithm Using the Personal Performance in Mammographic Screening Scheme (PERFORMS)
Yan Chen, Nottingham / United Kingdom
Author Block: G. Partridge1, P. Phillips2, J. James1, N. Sharma3, K. Satchithananda4, R. Butler5, J. Lewin5, M. Michell4, Y. Chen1; 1Nottingham/UK, 2Lancaster/UK, 3Leeds/UK, 4London/UK, 5New Haven, CT/US
Purpose: To compare the performance of a Digital Breast Tomosynthesis (DBT) Artificial Intelligence (AI) model as a standalone reader to that of a large cohort of breast imaging readers, using the Personal Performance in Mammographic Screening (PERFORMS) scheme. The performance of a subset of readers, assisted by the DBT AI during image interpretation, will also be reported.
Methods or Background: 75 challenging combined DBT and Synthetic 2D mammography (S2D) screening cases were collated into a PERFORMS test-set. Test-set images were analysed by a prototype server allowing batch-processing of a commercial AI model (Hologic Genius AI Detection [GAID] v2.0). The set was also distributed to 156 readers from 8 UK National Health Service (NHS) hospitals that use DBT in screening as part of the PROSPECTS trial, and to 6 readers from 1 US institution that employs DBT in routine screening. The AI performance will be benchmarked against the performance of this reader cohort.
The US readers will additionally re-review the test-set with AI-markup available for decision support, following a 6-8 week washout period. Performance with and without AI-support will be investigated and compared to the AI as a standalone reader.
Results or Findings: The AI model achieved an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.935, and a sensitivity of 89.5% and specificity of 85.7% at the optimal threshold (=33). Human readers are currently undertaking the case review, but their data will be reported at the conference.
Conclusion: This international, Multiple Reader Multiple Case (MRMC) study enables the comparison of a very large cohort of breast imaging readers to a DBT AI model, as well as investigating the affect of reading DBT with AI-support.
Limitations: The test-set is enriched with malignant cases which may influence human reader decisions.
Funding for this study: Funding was acquired from Hologic Inc.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study is classed as a clinical audit for quality assurance for improvement of the breast screening programme. Ethics Reference No: 88-1223.
7 min
Evaluation of an AI System for Cancer Detection in Abbreviated Breast MRI
Koen Eppenhof, Nijmegen / Netherlands
Author Block: K. Eppenhof1, A. Rodriguez Ruiz1, W. B. Veldhuis2, C. Van Gils2, A. M. Rosanò3, R. Yang4, D. E. Lehrer5, L. Çelik6, R. Mann1; 1Nijmegen/NL, 2Utrecht/NL, 3Sion/CH, 4East Brunswick, NJ/US, 5Buenos Aires/AR, 6Istanbul/TR
Purpose: To investigate the performance of an AI system for breast cancer detection in abbreviated DCE-MRI.
Methods or Background: A combination of high-risk screening and diagnostic DCE-MRI exams from five hospital groups and a public data set (Duke-Breast-Cancer-MRI) were acquired. Each MRI exam was processed by an AI system, which takes as input the pre-contrast and a single post-contrast T1 image (abbreviated breast MRI), detects suspicious regions, and outputs a malignancy score per breast between 1 and 10. Additionally, the AI system was evaluated on an enriched screening dataset from the DENSE trial.
Results or Findings: Area under the Receiver Operating Characteristic curve (AUROC) was computed for classifying exam malignancy for exams from four hospital groups located in Argentina (41 of 780 exams containing biopsy-proven cancer, AUROC 0.891 (95% CI=0.828-0.944)), Switzerland (98/3499, 0.863(0.824-0.896)), Turkey (33/164, 0.955(0.898-0.998)), and the US (153/1096, 0.904(0.877-0.929)). The consistency in AUROCs indicates robustness across populations, protocols, and scanners.
Because Duke-Breast-Cancer-MRI exams all contain cancer, a breast-level analysis was done where breasts without cancer were used as the negative class (904/1808 breasts containing cancer). The AUROC (0.965(0.957-0.972)) is similar to an earlier published AI that used two post contrast images.
For exams that had a BIRADS assessment, the agreement between the AI (score >= 9) and the radiologist interpretation (BIRADS 1 or 2 vs. 4 or 5) was found to be moderate (Cohen kappa=0.502(0.449-0.555)).
The performance on screening-only data was measured in exams from the fifth hospital located in the Netherlands (66/2920 exams containing cancer, AUROC 0.812(0.753, 0.868)), and exams from the DENSE trial (83/517, AUROC 0.803(0.747-0.856)).
Conclusion: A first evaluation of an AI system for abbreviated DCE-MRI shows potential for decision support in detecting breast cancer.
Limitations: The study has a retrospective design.
Funding for this study: Not applicable
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable
7 min
Validating the impact of real-world live use of AI as an additional reader in breast cancer screening (BCS)
Annie Ng, London / United Kingdom
Author Block: A. Ng1, E. Ambrozay2, E. Szabó2, B. Glocker1, P. Kecskemethy1; 1London/UK, 2Budapest/HU
Purpose: To validate that the measured impact of deploying AI as an additional reader is a result of the AI intervention and not purely the result of additional reading.
Methods or Background: Live-use of an AI-system as an additional reader (XR) to flag cases for additional review that it suggested to recall but the standard double reading (DR) decision was “no recall”, has been demonstrated to result in a 0.8/1000 increase in cancer detection rate (CDR) and a 0.1% increase in positive predictive value when 6.0% of cases are additionally reviewed, compared to DR.

To validate that the increased effectiveness in early cancer detection of the XR AI-workflow is not purely from additional reading, the maximum CDR increase opportunity due to third-human-reading a random 6.0% of cases was simulated, calculated as interval cancer rate (ICR) times the portion of cases to be third-human-read (6.0%), times the portion of human-detectable interval cancers (ICs) i.e. visible on priors. A range of ICR of 0.84-2.11/1000 was used (DOI:10.1038/s41523-017-0014-x). Studies have measured that 22% of ICs are human-detectable (DOI:10.1007/s00330-020-07130-y), however, a wider range of 22-100% was used. Third-human-reading was assumed to have an unrealistic 100% sensitivity among human-detectable ICs.
Results or Findings: For the lower and upper end of assumptions, respectively, the maximum CDR increase opportunity calculated for third-human-reading is 0.01 and 0.13/1000, which is 98.9% and 87.2% less than the XR AI-workflow, suggesting that the increased CDR impact of XR is 6-70 times more effective than third-human-reading without AI.
Conclusion: Simple simulations show that CDR improvements from third-human-reading a random set of cases would be marginal, validating that the substantial CDR increase demonstrated by XR is a direct effect from using AI to flag cases for additional review.
Limitations: Single AI assessed
Funding for this study: Kheiron Medical Technologies
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not required
7 min
Adding artificial intelligence (AI) case scoring in a breast screening programme to optimize reading workflow and workload: a retrospective study
Andrea Nitrosi, Reggio Emilia / Italy
Author Block: A. Nitrosi, R. Vacondio, L. Verzellesi, M. Creola, M. Bertolini, P. Giorgi Rossi, V. Iotti, P. Pattacini, C. Campari; Reggio Emilia/IT
Purpose: The objective of this study was to retrospectively evaluate a strategy to optimize reading workflow and readers’ workload based on the iCAD Case Malignancy Score (CMS).
Methods or Background: We analyzed 122,216 2D mammography screening reading times (RT) corresponding to 61,108 exams including 244 proven tumours, consequentially acquired in Reggio Emilia Breast Screening Program (BSP) starting from January 2023 to June 2024 and elaborated by iCAD Inc. ProFound AI 2D system. ICAD Case Malignancy Scores (CMS) represents the relative confidence that a case is malignant on a scale of 0% to 100%. A pool of radiologists performs blinded double reading plus arbitration framed in work-shift. Packs are assigned to a reader respecting a numerical criterion of maximum readings per work-shift. We analyzed the correlations (Spearman) between the RT of individual readers (normalized on the personal median) with the CMS and the breast density (D). The analysis was repeated considering only the women recalled / not recalled / true positive (TP).
Results or Findings: A positive correlation was demonstrated between CMS and RT (R = 0.76) and slightly between D and RT (R = 0.52), overall and in recalled and non-recalled women separately.
Using CMS, packs could be optimized based on individual reader characteristics to maximize the number of exams for each reader’s pack with constant recall rate (and TP): first simulations show up to 14% increase in the number of exams read over 4 hours effective reading period.
Conclusion: This scenario would not undermine the reading screening workflow while ensuring resource optimization nor introduce any cognitive bias influencing the readers since each session would have similar expected recall and detection rate.
Limitations: Cases refer only to Reggio Emilia BSP, limiting this study.
Funding for this study: This study was partially supported by the Italian Ministry of Health - Ricerca Corrente
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Compliance with Ethical Standards Institutional Review Board approval was not required because it is a Clinical Audit about a technical development. This study was conducted in accordance with the routine quality assurance procedures established by the Local Health Authority for its screening programmes. The Reggio Emilia Cancer Registry, which routinely collects the screening history of each case of breast cancer, has been approved by the Provincial Ethics Committee.
7 min
Assessment of an AI-system in indicating breast laterality for screen-detected and interval cancers in breast screening in a large-scale retrospective study
Peter Kecskemethy, London / United Kingdom
Author Block: A. Ng1, C. Oberije1, G. Fox1, R. Currie2, A. Redman3, A. Leaver3, W. Teh4, B. Glocker1, P. Kecskemethy1; 1London/UK, 2Exeter/UK, 3Gateshead/UK, 4Harrow/UK
Purpose: Assess the utility of an AI-system in indicating breast laterality in breast screening.
Methods or Background: Employing a commercially available AI-system as an independent reader, utilising its case-wise recall suggestions, within double reading has previously been shown to maintain/improve screening performance, while providing substantial workload savings. This has been demonstrated in a large-scale retrospective clinical study (306,839 cases from 236,739 participants between 2017-2021), involving three Hologic sites across the UK’s major genetic clusters (South-East/West/North), including more diverse ethnicities in London. To further assess the AI-system’s utility in supporting follow-up investigations for recalls, its breast laterality recommendation for screen-detected cancers (SDCs) and interval cancers (ICs) were compared to pathology information.
Results or Findings: The study included 2592 SDCs and 379 ICs. The AI-system correctly recalled 2304 SDCs (88.9% sensitivity) and 152 ICs (40.1% IC detection rate). Among the correctly recalled SDCs, the AI-system: A) indicated pathology-agreeing laterality in 84.5% (83.5% unilateral/1.1% bilateral), B) recalled unilateral cases as bilateral in 13.9%, C) recalled one side in bilateral cancer cases in 0.4%, and D) indicated the opposite side not assessed in 1.2%. The respective results for ICs were: A) 58.8% (58.1% unilateral/0.7% bilateral), B) 18.4%, C) 2.2%, and D) 20.6%. For category D, it is unknown if an early abnormality could be present as the AI-indicated side was not assessed by biopsy nor by additional diagnostic imaging.

The AI-system provides screening utility in scenarios A-C, which comprises 98.8% for SDCs, 79.4% for ICs, 88.1% for ICs diagnosed within 1 year, and 97.1% for false negative ICs (FNICs).
Conclusion: The AI-system’s laterality detection demonstrated utility in almost all SDCs/FNICs, and the large majority of ICs, showing it can support the clinical workflow with laterality information for follow-up assessments.
Limitations: Single AI assessed
Funding for this study: NIHR AI in Health and Care Award
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: UK HRA REC reference: 21/HRA/4830
7 min
Repurposed AI-Based Mammography Interpretation in Diverse Clinical Scenarios
Helen Ngo, Freiburg Im Breisgau / Germany
Author Block: H. Ngo1, J. Neubauer1, A. L. Palacios Acedo2, M. Windfuhr-Blum1, E. Kotter1, F. Bamberg1, J. Weiß1; 1Freiburg/DE, 2Marseille/FR
Purpose: This study evaluates the diagnostic performance of an artificial intelligence (AI) tool originally developed for screening mammography, now repurposed for use in various clinical scenarios, including diagnostic mammograms in 1) asymptomatic women, 2) symptomatic women and 3) patients with a personal history of breast cancer (PHBC).
Methods or Background: A total of 601 women with were retrospectively included and categorized into three subgroups: diagnostic mammograms of 1) asymptomatic women (n = 423), 2) symptomatic women (palpable abnormality, suspicious sonography, n=66) and 3) patients with PHBC (n =112). The AI-tool provided continuous scores (1 to 100) for potential malignancy, with histopathological confirmation and/or follow-up ≥2 years as reference standard.
Results or Findings: The AI-tool showed high performance across all three cohorts, with areas under the curve (AUC) for diagnostic mammograms of 1) asymptomatic women: 0.75 (95% CI: 0.51-0.98), 2) symptomatic women: 0.92 (95% CI: 0.81-1.0), and 3) patients with PHBC: 0.71 (95% CI: 0.52-0.90). Excluding women with extremely dense breasts (ACR D) increased the AUC for diagnostic mammograms of 1) asymptomatic women to 0.79 (95% CI: 0.41-1.0), 2) symptomatic women: 0.92 (95% CI: 0.81-1.0), and 3) patients with PHBC: 0.73 (95% CI: 0.51-0.95).
Using a threshold of the highest 10% AI-scores to binarize the continuous AI-output resulted in sensitivity 0.92 and specificity 0.50 for subgroup 1); 0.96 and 0.77 for 2) and 0.81 and 0.67 for 3), respectively.
Conclusion: Repurposed AI-tools can enhance malignancy detection across diverse patient groups, especially in less dense breasts. Optimizing thresholds for specified populations, such as asymptomatic and symptomatic cohorts, may further improve AI's diagnostic effectiveness.
Limitations: Varying breast densities, particularly extremely dense breast, can pose detection challenges, and the sample size of 601 may influence the generalizability of the findings.
Funding for this study: Unrestricted research grant from Lunit.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approved by local IRB.
7 min
Patient perceptions towards the use of artificial intelligence (AI) in breast cancer imaging
Tamara Suaris, London / United Kingdom
Author Block: D. Velazquez-Pimentel, S. Khan, T. Falco, S. Hickman, S. Dani, T. Suaris; London/UK
Purpose: The aim of this study is to evaluate patient perceptions towards the use of artificial intelligence (AI) in breast cancer imaging
Methods or Background: Women presenting to a single breast cancer unit in East London were invited to participate in a prospective survey. Baseline knowledge and attitude towards technology in daily living and attitude towards the use of AI in mammography screening was measured using a 4-point Likert scale. Demographic data including age, ethnicity, education was collected.
Results or Findings: 944 responses were analysed. Of these, 90% (n=853/944) expressed a preference for combined computer-physician reading with more women expressing confidence in the accuracy of combined computer-physician (93%, n=882/944) reading over computer reading alone (54% n=513/944).



Self-reported understanding of technology was associated with a higher level of concern. In patients with limited understanding 46% expressed concern with regards to the accuracy of computer read mammograms compared to 38% in patients with expert understanding. Level of concern was not significantly associated with age, ethnicity or education level (p > 0.05).



Regardless of level of concern, the majority of respondents expressed a positive opinion on the impact computer read mammograms can have on improving both efficiency (85%, n=798/944) and pick up rate (84%, n=797/944).
Conclusion: Despite confidence in the ability of AI to improve efficiency and pick up rate there is a strong preference expressed by patients towards combined computer-physician read mammograms. This study demonstrates that this remains true regardless of age, ethnicity or level of education. Level of concern is associated with self-reported understanding of technology; targeted patient education programs may support implementation of AI workflow in breast screening programs.
Limitations: Survey responses are subject to bias.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Patient Survey - local research lead confirmed no formal ethics application necessary
7 min
ADMEDVOICE – The Pathway to Polish Language Automatic Structured Reporting in Breast Ultrasound using Voice Recognition and Large Language Models
Maciej Bobowicz, Gdańsk / Poland
Author Block: M. Bobowicz, D. Szplit, A. Dąbkowska, J. Bogdan, K. Gwozdziewicz, J. Omernik, B. Graff, A. Czyżewski, K. Narkiewicz; Gdansk/PL
Purpose: Breast ultrasound (BUS) equipped with the ACR BI-RADS lexicon is a well-described diagnostic procedure with mandatory fields and a relatively closed vocabulary. This study aims to generate BUS-structured reports automatically using voice recognition and topic modelling in Polish.
Methods or Background: A dataset of 6269 BUS radiology reports from the University Clinical Center’s Hospital Information System covering 2013-2023 was obtained. The reports were created by more than ten experienced breast radiologists and multiple residents. They covered various clinical scenarios, including diagnosis, treatment, and follow-up tests in breast cancer, benign disease, mutation carriers, and studies without pathology.
Results or Findings: From 6269 reports, 48721 text fragments were obtained, representing specific parts of the BUS report used as training data. We identified specific ‘topics’ relating to ‘ontologies’ in these fragments. Topics represented parts of the radiologist’s report that could be structured into subsections: 1) reference letter information, 2) tissue composition, 3) pathology descriptors (masses and calcifications), 4) associated features, 5) axillary and intramammary lymph node descriptors, 6) other special cases, 7) conclusions, 8) recommendations, 9) final remarks. For automatic text recognition BERTOPIC was explored. As a next step, we invited 25 specialist radiologists, residents, medical students and other HCPs to record 3328 separate sentences for voice recognition algorithms training.
Conclusion: The presented research, which involved topic modelling, is a first step towards creating Polish language automatic structured BUS reporting using voice recognition and LLMs. The resulting database with voice samples at three quality levels will be released soon. It will allow AI training to reduce the radiology reporting burden with more natural voice commands being transferred to structured reports.
Limitations: The single-centre design, restriction to the Polish language, and lack of external validation.
Funding for this study: Funding for the ADMEDVOICE Project was provided by the Polish National Centre for Research and Development; Infostrateg IV action; grant number: INFOSTRATEG-IV/003/2022.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Bioethics Committee for Scientific Research of Medical University of Gdansk.
7 min
Contrastive Learning in Breast MRI: MLIP as the Base Foundation Model
Nika Rasoolzadeh, Nijmegen / Netherlands
Author Block: N. Rasoolzadeh1, T. Zhang2, R. Mann1; 1Nijmegen/NL, 2Amsterdam/NL
Purpose: To explore the potential of utilizing a contrastive language image pretraining approach for 3D breast MRI images.
Methods or Background: A dataset of 15005 pairs of dynamic contrast-enhanced (DCE) and subtraction MRI images with corresponding radiological reports from the Netherlands Cancer Institute were used for training a model to find the most similar image-text pairs by contrasting positive pairs (similar) against negative pairs (dissimilar) samples. Full MRI images and complete Dutch reports were utilized. The image and text embeddings were obtained using a 3D ResNet50 architecture and RadioLOGIC as the image and text encoders, respectively. Two inference scenarios were tested: image retrieval by text queries and BI-RADS prediction. The area under the curve (AUC) was used to evaluate the model's performance. The developed Multimodal Breast MRI Language-Image Pretrained (MLIP) model was first used for the zero-shot BI-RADS prediction task and was later fine-tuned.
Results or Findings: The preliminary results show an AUC of 0.717 (95% CI: 0.604, 0.824) for BI-RADS 4/5 abnormal MRI images retrieval, 0.640 (95% CI: 0.538, 0.740) for dense breast retrieval, and 0.601 (95% CI: 0.505, 0.698) for low background parenchymal enhancement (BPE) retrieval. In the second inference, the performance of MLIP was compared to that of a fine-tuned model. The fine-tuned model demonstrated improved accuracy, with a reduction in the number of originally benign cases misclassified as malignant.
Conclusion: In this study, a multi-modal breast MRI pretrained model was developed. The preliminary results suggest MLIP can be adjusted to perform diagnostic tasks and radiology report generations, holding the potential to serve as a foundation model for breast MRI analysis.
Limitations: The model needs to be validated on larger datasets and across more downstream tasks.
Funding for this study: Funding was provided by the ODELIA project (from the European Union’s Horizon Europe research and innovation programme under grant agreement, No 101057091)
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This study did not require formal ethics committee approval, as it exclusively used fully anonymized MRI images and reports. No identifiable personal data was collected or used in the analysis. All MRI data was anonymized prior to access, ensuring that no individual participants can be identified from the data
7 min
Generating virtual T2w-fat-saturated breast MRI acquisition using neural-networks
Andrzej Liebert, Erlangen / Germany
Author Block: A. Liebert1, D. Hadler1, C. M. Ehring1, H. Schreiter1, F. B. Laun1, M. Uder1, E. Wenkel2, S. Ohlmeyer1, S. Bickelhaupt1; 1Erlangen/DE, 2Munich/DE
Purpose: Multi-parametric breast MRI protocols typically include T2-weighted fat-saturated(T2w-FS) sequences, which are used for tissue characterization. However, their acquisition can significantly increase scan time. This study aims to evaluate, whether a 2D-U-Net neural-network can generate virtual T2w-FS images(VirtuT2) from other acquisitions of a routine multiparametric breast MRI protocol.
Methods or Background: This IRB-approved, retrospective study included n=914 breast MRI examinations performed between January 2017 and June 2020 at University Hospital Erlangen. The dataset was divided into training(n=665), validation(n=74), and test(n=175) sets. 2D-U-Net was trained on T1w, DWI, and DCE sequences to generate VirtuT2 . Quantitative metrics and a qualitative multi-reader assessment by two radiologists were used to evaluate the VirtuT2 images. For qualitative readings radiologist were asked to identify, whether an image is original T2w-FS or VirtuT2 image, evaluate the diagnostic image quality(DIQ) and wheter they can identify presence of edema around the mass-lesions.
Results or Findings: VirtuT2 images demonstrated high structural similarity(SSIM=0.87) and peak signal-to-noise ratio(PSNR=24.90) compared to original T2w-FS images. High level of the frequency error norm(HFEN=0.87) indicates strong blurring presence in the VirtuT2 images, which was also confirmed in qualitative reading. Radiologists correctly identified VirtuT2 images with 92.3% and 94.2% accuracy, respectively. No significant difference in DIQ was noted for one reader(p=0.21), while the other reported significantly lower DIQ for VirtuT2(p<=0.001). Moderate inter-reader agreement was observed for edema detection on T2w-FS images(ƙ=0.43), decreasing to fair on VirtuT2 images(ƙ=0.36).
Conclusion: Neural-networks can technically generate VirtuT2 images with high similarity to real T2w-FS images, using T1w, DWI and DCE acquisitions, however blurring remains a limitation. Future investigations with different architectures and using larger datasets are needed to improve clinical applicability.
Limitations: Limited dataset from a single site was used. Qualitative reading was performed on just n=52 cases.
Funding for this study: This project is funded by the Bavarian State Ministry of Science and the Arts in the framework of the bidt Graduate Center for Postdocs.
L.B. is funded by the DFG Grant No: 518689644
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study protocol was approved by the ethics committee of the Friedrich-Alexander Universität Erlangen-Nürnberg. The ethics comitee waived the need for informed consent.

Notice

This session will not be streamed, nor will it be available on-demand!

CME Information

This session is accredited with 1.5 CME credits.

Moderators

  • Jonas Teuwen

    Amsterdam / Netherlands

Speakers

  • Marcel Blum

    St.Gallen / Switzerland
  • Sarah Delaja Verboom

    Nijmegen / Netherlands
  • Yan Chen

    Nottingham / United Kingdom
  • Koen Eppenhof

    Nijmegen / Netherlands
  • Annie Ng

    London / United Kingdom
  • Andrea Nitrosi

    Reggio Emilia / Italy
  • Peter Kecskemethy

    London / United Kingdom
  • Helen Ngo

    Freiburg Im Breisgau / Germany
  • Tamara Suaris

    London / United Kingdom
  • Maciej Bobowicz

    Gdańsk / Poland