Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 1105 - AI-radiomics and clinical data convergence across the breast cancer continuum

March 5, 16:30 - 18:00 CET

6 min
Same studies, different scores: how RQS, RQS2, and METRICS shape radiomics quality assessment
Maciej Bobowicz, Gdańsk / Poland
Author Block: M. Bobowicz, M. Kosno, K. P. Brzozowski, E. Szurowska; Gdańsk/PL
Purpose: Radiomics has emerged as a powerful method for extracting quantitative insights from medical imaging, with the potential to enhance diagnostic and prognostic precision. We aimed to critically evaluate RQS, RQS2, and METRICS tools to assess their robustness, reliability, and practical utility, thereby guiding reproducible and high-quality radiomics research.
Methods or Background: A comprehensive search of PubMed, Embase, Scopus, Web of Science, and IEEE Xplore databases identified studies employing predicting pathological complete response (pCR) in breast cancer patients receiving neoadjuvant therapy based on MRI radiomics. The study quality was assessed using RQS, RQS2 and METRICS. We compared metrics across seven overarching categories: study design and protocol, imaging protocol quality, image preparation and processing, segmentation and ROI definition, feature extraction and selection, model building and validation, and reporting, transparency, and open science. Inter-reader agreement was evaluated with Cohen’s κ, and overall score reliability was determined using ICC.
Results or Findings: RQS and RQS2 emphasise clinical aspects, while METRICS provides more holistic perspective. The result of this approach is reflected in the higher median score achieved by METRICS compared to RQS and RQS2. The correlation between total scores was weak to moderate. The RQS vs. RQS2 analysis yielded a result of ρ = 0.312 (p ≈ 0.068). Similarly, the RQS vs. METRICS analysis produced a result of ρ = 0.180 (p ≈ 0.302). Finally, the RQS2 vs. METRICS comparison yielded a result of ρ = 0.412 (p = 0.014).
Conclusion: The METRICS tool is the most equitable choice, as each supercategory addresses multiple facets of the issue. Each research problem has unique characteristics, and the effectiveness of RQS, RQS2, or METRICS may differ. Therefore, when evaluating a model's quality, at least two forms should be used.
Limitations: None
Funding for this study: This project has received funding from the Digital Europe Programme under grant agreement No. 101100633 (EUCAIM); the European Union’s Horizon Europe and Horizon 2020 research and innovation programme under grant agreement No. 101057699 (RadioVal) and No. 952103 (EuCanImage)
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Prediction of Axillary Pathologic Complete Response after Neoadjuvant Chemotherapy in Breast Cancer Using Multiparametric MRI Radiomics and Deep Learning
Weiyue Chen, Lishui / China
Author Block: W. Chen, G. Lin, M. Chen, J. Ji; Lishui/CN
Purpose: To assess whether multiparametric MRI radiomics combined with deep learning can predict axillary pathologic complete response (pCR) following neoadjuvant chemotherapy (NAC) in breast cancer.
Methods or Background: This retrospective two-center study included 213 patients with breast cancer who underwent NAC and axillary surgery (center 1: n=144; center 2: n=69) in the training, validation, and test cohorts. Patients were stratified into axillary pCR and non-pCR groups based on pathology. Tumor regions were segmented on T2-weighted, diffusion-weighted, and dynamic contrast-enhanced MRI. Radiomics and deep learning features were extracted; feature reduction was performed using minimum redundancy maximum relevance, and least absolute shrinkage and selection operator. Radiomics and deep learning scores were derived. Logistic regression identified independent predictors and constructed clinical, radiomics, deep learning, and combined models. Discrimination, calibration, and decision curve analysis were used to assess performance.
Results or Findings: Twelve radiomics and fourteen deep learning features were selected. The clinical N stage, radiomics score, and deep learning score were independent predictors. The combined model achieved the highest AUCs (validation cohort: 0.948; test cohort: 0.891), significantly higher than the clinical model (validation cohort: 0.675; test cohort: 0.761) and radiomics model (validation cohort: 0.838; test cohort: 0.827) (all P < 0.05). Calibration showed good agreement, and decision curve analysis demonstrated the greatest net clinical benefit for the combined model.
Conclusion: Multiparametric MRI models integrating radiomics, deep learning, and clinical factors enable accurate prediction of axillary pCR after NAC in breast cancer, supporting personalized treatment strategies.
Limitations: First, as a retrospective study, it is inevitably subject to choice bias and inherent errors. Second, all ROIs in our study were delineated manually, rendering inter-operator variability unavoidable, which may affect the reproducibility of features.
Funding for this study: This work was supported by the Key Project of Joint Construction by Provincial and Ministerial Authorities (WKJ-ZJ-2452 to Minjiang Chen); Public Welfare Technology Application Research Project of Lishui City (2024GYX45 to Weiyue Chen).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Institutional Review Board and Human Ethics Committee of the Fifth Affiliated Hospital of Wenzhou Medical University and the Sixth Affiliated Hospital of Wenzhou Medical University.
6 min
Diagnostic Performance of Virtual vs. True Non-Contrast Dual-Energy CT for Differentiating Hepatic Cysts from Liver Metastases in Breast Cancer
Aynur Gökduman, Frankfurt / Germany
Author Block: A. Gökduman, I. Yel, P. Reschke, J. Gotta, S. Mahmoudi, V. Koch, M. Dimitrova, T. Vogl, C. Booz; Frankfurt/DE
Purpose: To assess the diagnostic performance and Hounsfield unit (HU) correlation between virtual non-contrast (VNC) and true non-contrast (TNC) dual-energy CT (DECT) for differentiating hepatic cysts from liver metastases in patients with breast cancer.
Methods or Background: This retrospective study included 668 liver lesions (334 cysts, 334 metastases) in patients with histologically confirmed breast cancer who underwent DECT between January 2020 and December 2022. HU values were measured in lesion-specific regions of interest (ROIs) from both TNC and VNC images. Receiver operating characteristic (ROC) analysis was performed to identify optimal cut-off values using the Youden index. Diagnostic performance was assessed using AUC, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy. Correlation between TNC and VNC HU values was analyzed. The reference standard for lesion classification was either contrast-enhanced MRI or image-guided biopsy.
Results or Findings: TNC achieved an area under the curve (AUC) of 0.869, while VNC demonstrated a comparable AUC of 0.866; both results were statistically significant (p < 0.001). TNC showed a sensitivity of 87.7%, specificity of 63.5%, and accuracy of 75.6%. VNC yielded a sensitivity of 85.0%, specificity of 61.4%, and accuracy of 73.2%. A strong positive correlation was observed between TNC and VNC HU values (r = 0.977; p < 0.001). In this cohort, omission of the true non-contrast phase and exclusive use of VNC would have resulted in a documented CTDI reduction of approximately 40–50%.
Conclusion: VNC imaging demonstrated diagnostic performance nearly equivalent to that of TNC, with minimal clinically relevant differences. Given the strong HU correlation, replacing TNC with VNC could significantly reduce radiation exposure in oncological liver staging, streamline imaging workflows, and maintain high diagnostic performance.
Limitations: Minor attenuation discrepancies could influence lesion classification in borderline cases.
Funding for this study: No funding.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The local ethics committee approved the study protocol.
6 min
Implementation of AI in a National Breast Cancer Screening System
Osmo Tervonen, Oulu / Finland
Author Block: O. Tervonen; Oulu/FI
Purpose: The purpose of this study was to study which mandatory steps should be achieved to implement AI as a part of a national breast cancer screening program.
Methods or Background: The study consisted of following steps: 1. Finding of an AI software ready be implemented; 2. Health care service provider commitment; 3. Compliance with Hospital IT system 4. Compliance with hospital data security quidelines; 5. Rediness of the ministry for legislation change of the national screening process; 6. National Radiology Society participation; 7. Defining the new screening process with the health care professionals; 8. Compliance with the national and EU GDPR regulation
Results or Findings: Seven out of eight required steps were achieved. We defined an impementation process, where AI process is running parallely with the present well defined two radiologist screening process, and after experience, define the new national screening process. Multiple AI software were considered applicaple and the one with an independent reader process model was selected. The health care provider was committed to co-work based on proven short and long-term benefits. The national legislator indicated readiness in starting legislation change based on the results. However, as a part of GDPR regulation, the National Office of Data Protection Ombudsman, which is the highest national authority, made a statement that all the screening data has to be removed from the archives of the AI vendor after the process, which was impossible to comply by the vendor.
Conclusion: The current GDPR regulation and how it is implemented nationally does not allow to proceed in taking AI as a part of a national screening program
Limitations: The data used for the evaluetion was based on review of the literature and assement by the board members of the study,.
Funding for this study: This study has not received external funding.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The ethicch committee of the hospital has apptoved the study.
6 min
Automated Classification of Malignant versus Benign Breast Lesions on DCE-MRI: A Comparative Study of 3D Deep Learning Architectures and Input Preprocessing Strategies
Mustafa Arda Kukal, Istanbul / Turkey
Author Block: M. A. Kukal, M. Gitmez, T. Orhan, B. Uzunoğlu, A. Hamamci, F. Celebi; Istanbul/TR
Purpose: To systematically evaluate different deep learning architectures (DL) and input preprocessing strategies for automated differentiation of malignant from benign breast lesions on dynamic contrast-enhanced MRI (DCE-MRI).
Methods or Background: This retrospective study analyzed breast DCE-MRI examinations acquired on 1.5T and 3T scanners. Post-contrast second phase dynamic T1-weighted sequences were processed using three input strategies: full image volumes, segmentation masks, and lesion-centered patches (96×96×64 voxels). Data were partitioned into training (80%) and independent test (20%) sets, with patient-stratified 5-fold cross-validation used for model selection. Five 3D convolutional neural networks were evaluated: ResNet3D-18, R(2+1)D-18, MC3-18, DenseNet3D-121, and X3D-S. All architectures were adapted for single-channel volumetric input with binary classification output. Class imbalance was addressed through stratified sampling, focal loss, and data augmentation. Performance was assessed using precision, recall, F1-score, area under the precision-recall curve (AUC-PR), and accuracy.
Results or Findings: The cohort included 163 patients (54 benign, 109 malignant lesions) with median age 44 years (IQR: 40-51) and median lesion size 12 mm (IQR: 8-20). All classifications were validated against histopathological standards. Patch-based ResNet3D-18 demonstrated the most ballanced and robust performance on the independent test set: precision 0.85, recall 0.81, F1-score 0.83, PR-AUC 0.83, and accuracy 0.80. Patch-based preprocessing consistently yielded the most robust performance across all architectures.
Conclusion: Patch-based input preprocessing demonstrated superior performance compared to full volumes and segmentation masks for automated breast lesion classification on DCE-MRI. The achieved diagnostic accuracy meets clinically acceptable standards. While improved performance is expected with larger datasets, current findings support clinical feasibility. Multi-institutional validation studies are warranted to confirm generalizability.
Limitations: Not Applicable
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Yeditepe University, IRB #E.83321821-805.02.03-460
6 min
A Physics-Informed Neural Network for Robust Pharmacokinetic Parameter Estimation from Ultrafast Breast MRI Time-Intensity Profiles with Varying Temporal Resolution
Gökhan Ertaş, Istanbul / Turkey
Author Block: G. Ertaş, F. Celebi, D. Yildirim; Istanbul/TR
Purpose: Ultrafast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) achieves high temporal resolution for breast cancer assessment but yields sparsely sampled time-intensity profiles (TIPs). This study develops and validates a physics-informed neural network (PINN) that simultaneously reconstructs continuous TIPs and estimates pharmacokinetic (PK) parameters from sparse ultrafast DCE-MRI data, comparing its accuracy with conventional curve-fitting methods.
Methods or Background: The PINN comprised four fully connected layers with hyperbolic tangent activations, trained to minimize a composite loss combining data-fidelity (mean-squared error) and physics-informed terms that penalize residuals of the simplified Tofts ordinary differential equation. Ktrans and Vp were optimized jointly with network weights. Conventional nonlinear least squares (NLLS) fitting using Levenberg-Marquardt algorithm provided comparison. Both methods were evaluated on synthetic datasets with known ground-truth values: benign lesions (median Ktrans=0.013, range 0.0026–0.17; Vp=0.0022, range 0.00025–0.017) and malignant lesions (Ktrans=0.073, range 0.013–0.14;Vp=0.021, range 0.002–0.099) across temporal resolutions from 5-13 seconds at 1‑second intervals. Performance was quantified using adjusted R2 (adjR2) for goodness-of-fit and median percentage absolute error (MdPAE) for parameter accuracy.
Results or Findings: Across 1000 synthetic sparse TIPs, PINN generated smooth continuous reconstructions with excellent agreement (adjR2=0.993) and low parameter errors (MdPAE: Ktrans=0.5%, Vp=1.4%). NLLS achieved superior curve fitting (adjR2=0.999) but substantially higher parameter errors (MdPAE: Ktrans=3.1%, Vp=24.7%). PINN consistently delivered more accurate PK estimates than NLLS, with greatest improvement for Vp estimation.
Conclusion: Integrating tracer-kinetic equations within neural networks enables robust TIP reconstruction and more accurate PK parameter estimates from sparse ultrafast DCE-MRI, enhancing quantitative biomarker mapping reliability in breast MRI.
Limitations: Not applicable.
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Can large language models replace traditional textbooks in breast radiology education for medical students?
Umitcan Yildiz, Istanbul / Turkey
Author Block: U. Yildiz, O. Adiguzel Karaoysal, A. Karadayi Buyukozsoy, N. Voyvoda; Istanbul/TR
Purpose: This study aimed to systematically assess the content validity and educational adequacy of lecture notes on breast imaging for medical students, produced by five prominent Large Language Models (LLMs), in comparison to a standard radiology textbook.
Methods or Background: Five large language models (LLMs)—GPT-4o, Claude 4 Sonnet, Gemini 2.5 Pro, DeepSeek V3, and Grok 3—were tasked with generating lecture notes on breast imaging. A rubric consisting of 14 themes and 43 subthemes was constructed based on a reference textbook. Three radiologists, each with distinct professional backgrounds, independently evaluated the outputs of these models for accuracy, scope, clarity, and educational value using the Content Validity Index (CVI) metrics.
Results or Findings: None of the LLMs achieved the recommended CVI thresholds, which are indicative of high content validity. The most effective model addressed only 34.9% of the subthemes and 78.6% of the themes, whereas the other models demonstrated inferior performance. Notably, there was significant inter-rater variability among the expert evaluators, underscoring subjective differences in assessment.
Conclusion: Current general-purpose LLMs are insufficient as standalone resources for specialised medical education purposes. While they can address some topics, performance variability necessitates domain-specific fine-tuning, advanced prompting strategies, and robust evaluation frameworks. They are best used as supplementary tools within structured, instructor-led learning environments.
Limitations: The limitations of the study are its use of a single, simple prompt that may not reflect the models' full potential and an evaluation that lacked student input. The study was also limited by its focus on a single topic, its restriction to the English language, and the uncontrolled design variations among the LLMs.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The ethics committee notification can be found under the number UID 2025/0l0.99/l8/3.
6 min
Performance evaluation of mammography AI detection software on symptomatic breast cohort
Nisha Sharma, Leeds / United Kingdom
Author Block: N. Sharma1, M. M. M. McMahon1, M. Fletcher1, D. Manuel1, S. Rajan1, A. Kshirsagar2, F. Eskandari1, J. Simpson1; 1Leeds/UK, 2Santa Clara, CA/US
Purpose: Several studies have reported standalone performance of artificial intelligence (AI) software in the screening population. The purpose of this study is to assess the performance of commercially available AI software in symptomatic population.
Methods or Background: This prospective study included a total of 308 subjects (79 with dense breasts and 229 with fatty breasts) imaged at diagnostic clinic from August 2024 to March 2025. All women with clinically suspected cancer had bilateral DBT and those recalled for a mammographic abnormality on 2D mammogram had synthetic 2D and 3D imaging of the affected breast only. The cohort contained 87 malignant cases confirmed by biopsy, 46 benign cases confirmed by biopsy and 175 cases deemed benign after follow-up imaging. Results of Genius AI® Detection 2.0 were available for each subject and were reviewed by a breast imaging expert to determine TP, TN, FP and FN by AI for each patient based on location of AI marks and the pathological outcome of the subject. Sensitivity, specificity, PPV and NPV for AI were calculated as well as ROC analysis was performed.
Results or Findings: Sensitivity of AI software to detect biopsy proven cancers was 93.1% (81/87) with a 95% CI of 85.0-97.2%. Specificity of AI software to correctly identify benign cases was 52.9% (117/221) with a 95% CI of 46.2%-59.7%. Positive predictive value of AI for this cohort was 43.8% (81/185) with a 95% CI of 36.6%-51.3%. Negative predictive value of AI for this cohort was 95.1% (117/123) with a 95% CI of 89.2%-98.0%. AI software model achieved an area under ROC curve (AUC) of 0.89 with a 95% CI of 0.85 to 0.93.
Conclusion: The AI detection software demonstrated high sensitivity and high specificity in symptomatic population.
Limitations: Single centre study
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
How Fair Are We? A Systematic Review of Bias in MRI Radiomics Models for Predicting Breast Cancer Treatment Response
Maciej Bobowicz, Gdańsk / Poland
Author Block: M. Bobowicz1, D. A. Kessler2, M. Kosno1, S. Joshi2, K. Marias3, O. Diaz2, K. Lekadir2; 1Gdańsk/PL, 2Barcelona/ES, 3Heraklion/GR
Purpose: Radiomics is a widely employed technique for predicting pathological complete response (pCR) to neoadjuvant therapy in breast cancer. Despite its popularity, several problems remain related to fairness, generalizability, and methodological quality. In this review, a thorough analysis was conducted, thereby exposing the potential sources of bias and limitations in the existing studies.
Methods or Background: A comprehensive search was conducted in major bibliographic databases, including PubMed, Embase, Scopus, Web of Science, and IEEE Xplore, to identify studies that utilised MRI-based radiomics for predicting pCR in breast cancer patients undergoing neoadjuvant therapy. The methodological quality of the included studies was assessed using the METhodological RadiomICs Score (METRICS), while a custom framework adapted from the QUADAS-2 tool was used to determine fairness. Additionally, a GUI-based sample size calculator was developed, integrating the events per variable (EPV) rule and the Cox-Snell R² approach to evaluate the adequacy of study power.
Results or Findings: A total of 35 studies met the inclusion criteria and were thus deemed eligible for further analysis. The majority of these studies (80%) were retrospective and based on single-centre cohorts, with limited demographic detail and infrequent external validation (14%). A mere five studies (14%) assessed the model's performance across demographic or clinical subgroups. The sample sizes of the studies were often small, and 75% of the studies were underpowered based on established EPV and R² criteria.
Conclusion: This analysis enables us to provide practical recommendations for enhancing the reliability, generalizability, and equity of radiomics-based models for predicting pCR and potentially beyond. This is achieved by underscoring standard practices and pinpointing potential sources of bias, along with developing sample size calculators based on dual methodologies: the EPV rule and the Cox-Snell R² approach.
Limitations: Not specified in the abstract
Funding for this study: Digital Europe Programme grant No. 101100633 (EUCAIM); Horizon Europe and Horizon 2020 grant No. 101057699 (RadioVal) and No. 952103 (EuCanImage)
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Is Digital Breast Tomosynthesis (DBT)-based Artificial Intelligence (AI) Better than 2D Mammography-based AI?
Yan Chen, Nottingham / United Kingdom
Author Block: G. J. W. Partridge1, H. Jupp1, J. James1, M. Michell2, Y. Chen1; 1Nottingham/UK, 2London/UK
Purpose: Digital Breast Tomosynthesis (DBT) has been shown to improve sensitivity and specificity for cancer detection compared with 2D digital mammography (2DDM) when interpreted by human readers. It is therefore expected that AI trained on DBT would also outperform AI based on 2DDM. In this study, we directly compare the performance of DBT-AI and 2DDM-AI using a dataset of paired DBT+2DDM cases from the UK DBT screening PROSPECTS Trial.
Methods or Background: In the UK, the multi-centre PROSPECTS randomised trial has been conducted to investigate the cost-effectiveness of replacing 2DDM with DBT in screening. Between January 2019-October 2022, 14,479 women were randomised into the intervention arm of PROSPECTS and screened with 2-view combo DBT+2DDM. Malignant cases are pathology proven and normal cases will have a negative 3-year follow-up screening episode. A DBT- and a 2DDM-based AI algorithm from the same commercial AI vendor (Lunit) will analyse the DBT and 2DDM images from this cohort, respectively. AI case scores between modalities will be compared using linear correlation and Bland-Altman plots, and ROC analysis will compare overall model performance.
Results or Findings: The eligible cases from the PROSPECTS dataset are currently being analysed by the AI algorithms. Data will be presented at ECR if accepted.
Conclusion: As the popularity of DBT based screening grows, it’s important to know how the performance of DBT-based cancer detection AI systems compare to 2DDM-based counterparts. If DBT-based AI performance is superior to 2DDM, this may also support wider adoption of DBT in screening.
Limitations: Cases from the PROSPECTS Trial are from one vendor (Hologic) which may limit the generalisability of the findings. Relatively small size of evaluation test-set; low cancer prevalence (but screening setting).
Funding for this study: Lunit
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Development of an Integrated Radiopathomics Signature for Oncotype DX Prediction in Breast Cancer: a pilot study
Valentina Giannini, Torino / Italy
Author Block: V. Giannini, G. Nicoletti, A. Defeudis, E. Battista, I. Castellano, M. Durando; Torino/IT
Purpose: To evaluate the feasibility of developing an Artificial Intelligence (AI)-based integrated biomarker to predict Oncotype DX using magnetic resonance imaging(MRI) and H&E-fixed histopathological Whole Slide Images(WSI).
Methods or Background: A total of 120 patients (94 from the Duke-Breast-Cancer-MRI dataset and 26 from an internal cohort) were used for the radiomics signature, while 55 were included in the pathomics analysis; 19 were common to both. In the pathomics pipeline, H&E WSI were divided into 224×224pixel patches, and color normalized with the Macenko method. First-order and texture features were extracted with PyRadiomics and aggregated at patient level using descriptive statistics. Redundant features (Pearson correlation≥0.8) were removed. The most relevant were retained (correlation>0.3 with the output) and used to train multiple classifiers with leave-one-out cross-validation (CV). For radiomics, images were Z-score standardized, tumors manually segmented, and first-order, texture, and shape features extracted. Feature selection was performed with mRMR algorithm, and classifiers were trained with stratified CV on the Duke dataset and tested on the internal cohort.
Results or Findings: In pathomics, two features were retained, with ElasticNet yielding the best performance. In radiomics, four features were selected, with the Decision Tree performing best. On the 19 patients common to both datasets, each model reached 74% accuracy. The pathomics model misclassified five patients (two as low risk, three as high risk), while the radiomics model misclassified five (one as low risk, four as high risk). When combined into a radiopathomics signature, accuracy increased to 84%, with three misclassifications (one as low risk, two as high risk).
Conclusion: This pilot study supports the feasibility of a radiopathomics signature for Oncotype DX prediction.
Limitations: The small sample size and monocentric design of the pathomics cohort limit generalizability, highlighting the need for larger, multicentric validation.
Funding for this study: Not Applicable
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Artificial intelligence scores as a biomarker for ductal carcinoma in situ upgrade risk
Manisha Bahl, Cambridge / United States
Author Block: M. Bahl, A. Kniss, H. Kim, K-S. Kim, S. Do, L. Lamb; Boston, MA/US
Purpose: Artificial intelligence (AI) is increasingly used for mammography to enhance breast cancer detection. The purpose of this study was to evaluate whether AI scores for screening-detected ductal carcinoma in situ (DCIS) are associated with upgrade to invasive carcinoma at surgery.
Methods or Background: This retrospective study included consecutive screening digital breast tomosynthesis (DBT) examinations performed between 2014 and 2021 at a single academic institution. All cases were radiologist-initiated screening recalls with biopsy-confirmed pure DCIS. A commercially available, FDA-cleared AI algorithm (Genius AI® Detection 2.0; Hologic, Inc.) assigned a score (0-100) to each examination; scores ≥22 were classified as positive per vendor recommendation. Upgrade status to microinvasive (≤1 mm) or invasive (>1 mm) carcinoma was determined from surgical pathology reports. Associations between AI scores and upgrade risk were evaluated using standard statistical tests, with p < 0.05 considered significant.
Results or Findings: The study cohort included 344 women (mean age 62 ± 12 years) with biopsy-proven DCIS. Positive AI scores (≥22) were observed in 94.8% (326/344) of cases. At surgery, 15.1% (52/344) were upgraded to invasive carcinoma, including 25.0% (13/52) with microinvasive disease. Upgraded cases had a significantly higher mean AI score compared with nonupgraded cases (78 vs. 60, p < 0.001). No significant difference in AI scores was observed between upgrades to invasive carcinoma versus microinvasive disease (80 vs. 69, p = 0.11).
Conclusion: Higher AI scores on screening DBT were associated with DCIS cases upgraded to invasive carcinoma. These findings suggest that AI-derived scores may serve as a potential biomarker of biologically aggressive DCIS and could inform risk stratification and management strategies.
Limitations: The limitations of the study are single-center, retrospective design and use of one vendor/algorithm with a vendor-defined threshold, which may limit generalizability.
Funding for this study: Funding was provided by Hologic, Inc. The authors, none of whom are employees of Hologic, maintained full control over the data and the submitted information.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This retrospective Health Insurance Portability and Accountability Act (HIPAA)-compliant study was granted an exemption from the requirement for written informed consent by the institutional review board at the Massachusetts General Hospital (Protocol #: 2023P003130).
6 min
Performance of Ultrasound-Based Machine Learning Models in Predicting Neoadjuvant Chemotherapy Response in Breast Cancer: A Systematic Review and Meta-Analysis
Parya Valizadeh, Tehran / Iran
Author Block: P. Valizadeh1, P. Jannatdoust11, N. Moradi1, S. Yaghoobpoor1, S. Toofani2, N. Rafiei3, H. Ghorani1, A. Arian1; 1Tehran/IR, 2Bojnourd/IR, 3Isfahan/IR
Purpose: Breast cancer is the most prevalent malignancy in women worldwide. Neoadjuvant chemotherapy (NAC) is widely employed before surgery to improve outcomes. Machine learning (ML) models using ultrasound-derived radiomic features may facilitate early prediction of NAC response, supporting personalized clinical decision-making. This study evaluated the diagnostic performance of ultrasound-based ML and deep learning (DL) approaches for NAC response prediction in breast cancer through a meta-analysis.
Methods or Background: A literature search was performed in PubMed, Embase, Scopus, and Web of Science according to PRISMA-DTA recommendations. Studies developing ultrasound-based radiomics or DL models for predicting NAC response were included. Separate meta-analyses were conducted for models predicting complete and partial responses.
Results or Findings: Twenty-two studies met the inclusion criteria. For complete response prediction, pooled sensitivity, specificity, and AUC for internally validated models were 85.1% (95% CI: 79.2–89.6%), 85.8% (95% CI: 76.7–91.8%), and 86% (95% CI: 82–94%), respectively. Externally validated models achieved 82.9% sensitivity (95% CI: 76.2–88.1%), 89.4% specificity (95% CI: 84.7–92.9%), and 93% AUC (95% CI: 82–94%). For partial response, meta-analysis of internal validation results yielded 87.5% sensitivity (95% CI: 85.1–89.6%), 82.3% specificity (95% CI: 75.6–87.5%), and an AUC of 88% (95% CI: 85–92%).
Conclusion: Ultrasound-based ML techniques demonstrate promising diagnostic accuracy in predicting NAC response among breast cancer patients. Delta-radiomic approaches may further improve performance. Prospective studies are required to establish generalizability for clinical use.
Limitations: The METRICS assessment showed wide variation in study design, with many lacking external validation and transparency. Moreover, some studies included overlapping cohorts, limiting generalizability. Heterogeneity was notable, especially in partial response models, likely due to inconsistent definitions. Furthermore, differences in model development contributed to subgroup variation. However, the small number of included studies prevented multivariable subgroup analyses to account for confounding factors.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Influence of Mammographic Positioning Quality, BI-RADS Density, and Mammography Image Type on the Discriminatory Performance of an AI Short-Term Breast Cancer Risk Model
Georgia Spear, Park Ridge / United States
Author Block: G. Spear1, M. Abdolell2, L. R. Margolies3, K. Yao4; 1Park Ridge, IL/US, 2Halifax, NS/CA, 3New York, NY/US, 4Evanston, IL/US
Purpose: To evaluate how mammographic positioning quality, breast density, and image type influence the discriminatory performance of an image-based AI model for short-term breast cancer risk, with implications for deployment in clinical practice.
Methods or Background: A retrospective cohort study of 59,352 screening mammograms (1,822 cancers; 57,530 controls) acquired between May 2017 and April 2023 was performed at an urban health system. Eligible exams comprised standard four-view sets without implants; controls required a negative follow-up ≥1 year after the index exam. AI algorithms categorized positioning quality using a 12-level PGMI scale, breast density using ACR BI-RADS 5th edition categories (A–D), and Hologic image type (FFDM, C-view synthetic 2D, Intelligent 2D). Short-term risk predictions were generated with the Mirai model and evaluated by 2-year area under the ROC curve (AUC). Stratified analyses compared performance across positioning, density, and image type strata; p<0.05 considered statistically significant.
Results or Findings: Mirai overall 2-year model performance was AUC=0.73 but varied significantly across strata. Exams rated P+/P achieved higher discrimination (AUC=0.80) than those rated P–/G+/G/G–/M+/M/M–/I+/I/I– (AUC=0.72), p=0.02. BI-RADS A mammograms achieved higher discrimination (AUC=0.76) than denser categories B–D (AUC=0.72), p=0.05. FFDM images achieved higher accuracy (AUC=0.80) relative to synthetic 2D formats (AUC=0.72), p<0.001. Positioning quality, breast density, and image type exerted independent and measurable effects on model performance.
Conclusion: Suboptimal positioning, higher breast density, and synthetic 2D images reduce the discriminatory performance of an AI-based short-term breast cancer risk model. Ensuring optimal positioning quality, accounting for density, and recognizing modality type are critical for robust AI-driven risk assessment and successful clinical implementation.
Limitations: Single health system retrospective design; potential residual confounding. Future work will involve external validation across multiple vendors and sites to confirm generalizability and assess clinical utility.
Funding for this study: NorthShore University HealthSystem
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approved by the NorthShore University HealthSystem Institutional Review Board (EH22-163)