Research Presentation Session: Imaging Informatics and Artificial Intelligence

RPS 1905 - Artificial intelligence in abdominal and oncological imaging

March 1, 12:30 - 13:30 CET

7 min
Total Segmentator: Integration and validation into PACS workstation for abdominal CT scans. A feasibility study
Georgios Lappas, Athens / Greece
Author Block: G. Lappas1, N. Patlakas1, P. Giannikopoulos1, M. Triantafyllou2, G. I. Kalaitzakis3, M. Klontzas2, K. Petropoulos1; 1Athens/GR, 2Crete/GR, 3Heraklion/GR
Purpose: This study aims to validate the performance of the open-source Total Segmentator for the segmentation of abdominal organs in CT scans and integration into the existing PACS workstation.
Methods or Background: The model's segmentation capability was quantified using Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD) across five datasets (N=1243). Data variability including statistics radiomics analysis was performed. Grad-CAM and Monte-Carlo were deployed focusing on the understanding of model decision-making and robustness, respectively. Οne assistant professor of radiology and one senior radiology resident rated the predicted segmentations’ quality from Greek hospital CT scans (N=100) and integration into the clinical routine.
Results or Findings: The model demonstrated high accuracy in segmenting most organs, e.g., DSC - CI: 0.85-0.97, while showed lower performance on gallbladder, pancreas and prostate, e.g., DSC – CI: 0.71-0.85. Those results depict the robustness of the model across the five datasets which present high variability depicted by normalized volume (μ±σ: 0.24±0.18) and normalized intensity (μ±σ: 0.39±0.21) coming in agreement with radiomics findings. Grad-CAM and Monte-Carlo noise results provided insights into the model's decision-making process and highlighting areas for potential improvement. The clinical feasibility study indicates a promising tool generating high quality segmentations with 97% of those requiring minimal manual adaptations.
Conclusion: Total Segmentator showed robust performance in segmentation of the major abdominal organs with poorer performance for gallbladder, pancreas and prostate. The integration into PACS workstation proves the model's potential for routine use in Greek hospitals marking a significant step towards more efficient radiological flow.
Limitations: The model had lower performance for gallbladder, pancreas and prostate while 7% of the validated cases had disagreements larger than 50% volume-wise. The available number of CT scans used for rating by radiologists was limited.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable
7 min
Results of the ULS23 Challenge on automatic 3D universal lesion segmentation in computed tomography
Alessa Hering, Nijmegen / Netherlands
Author Block: M. J. J. De Grauw1, E. Scholten1, E. J. Smit1, M. J. Rutten2, B. Van Ginneken1, M. Prokop1, A. Hering1; 1Nijmegen/NL, 2'S-Hertogenbosch/NL
Purpose: Generalizable automatic segmentation methods are well-suited for application in the diverse clinical contexts encountered during tumour follow-up in CT. The ULS23 challenge establishes the state-of-the-art in automatic 3D lesion segmentation quality, measurement accuracy, and prediction robustness.
Methods or Background: Current benchmarks often focus on organ-specific lesion segmentation, yet diverse clinical cases require fast, generalist models. The ULS23 challenge focuses on universal 3D lesion segmentation across chest-abdomen-pelvis CT, with 38,693 diverse lesions in the training dataset. The evaluation dataset contains 775 clinically relevant lesions from 284 patients across two Dutch tertiary care centers. We developed a strong baseline method based on the nnUnet and invited the research community to submit their solutions to the challenge. Post-challenge, we conducted experiments to explore lesion type influence, uncertainty, and robustness.
Results or Findings: The ULS23 challenge encouraged over 50 international researchers to develop solutions for automatic 3D lesion segmentation. During the official challenge period, 153 submissions were recorded during the development phase with seven teams submitting to the final leaderboard. The U-mamba framework achieved the highest challenge score, excelling in segmentation quality (70.8% ± 23.5% Dice), axial measurement accuracy (Long-axis 10.3%, Short-axis 11.8% Symmetric Mean Absolute Percentage Error), and robustness when evaluated using repeated segmentation (79.7% ± 24.2% Dice). Bone, pancreas and colon lesions remain challenging, and while segmentation consistency improves with performance, models still showed significant variability, affecting measurement accuracy.
Conclusion: The results of the ULS23 challenge demonstrate the potential of 3D universal lesion segmentation using large, aggregated datasets as a viable alternative to organ-specific models. However, significant variability in segmentation performance, particularly for bone, pancreas and colon lesions, indicates the need for further improvements. Addressing these performance inconsistencies could help reduce variance and enhance overall clinical applicability.
Limitations: Not applicable.
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: N/A
7 min
Automated Matching of Lesions in Cancer Follow-Up Using 3D Siamese Neural Networks and CT
Alejandro Vergara, Valencia / Spain
Author Block: A. Vergara, A. Jimenez-Pastor, A. Alberich-Bayarri; Valencia/ES
Purpose: There is a lack of automated and reliable tools to automatically track lesion changes over time, as well as to streamline treatment response reporting, such as RECIST-1.1 and others. The primary goal of this work was to address the automatic matching of lesions in the thoracoabdominal region across timepoints through 3D-Siamese Neural Networks (SNN).
Methods or Background: A retrospective dataset of 253 longitudinal CT exams from metastatic NSCLC patients was used, with a high variability in lesion location, including lymph nodes/lungs/liver/adrenal glands (43.87%/34.02%/11.82%/6.17%, respectively) and volume (0.1-1000cm3). A 3D-SNN architecture was used to compare the similarity between tumor pairs and matching the same lesion across consecutive scans. The final model resulted from a two-step training process, initially evaluating 144 hyperparameter settings and subsequently retraining using the top-performing ones. Different configurations were created combining input information (CT image, CT image and lesion segmentation mask, and CT image with several HU windows), learning rate (LR), LR schedulers, convolutional block complexity (1/2 convolutional layers), and loss functions (contrastive and BCE). An 80/20 training/test split ensured comparable lesion size and location heterogeneity in both sets.
Results or Findings: The best model employed the CT image as input, a LR of 10e-4, a step LR scheduler, BCE loss, and 1-convolutional-layer block, delivering 92.80%/91.10%/92.4% in accuracy/precision/recall in the test set. The accuracy decomposed by location was 93.79%/88.32%/92.50%/94.35% for lymph nodes, lung, liver and adrenal gland, respectively. Considering lesion volumes, the accuracy was 90.57%/93.35%/94.48% for small (<10cm3), medium (10-100cm3), and large (>100cm3) ones.
Conclusion: 3D SNNs are a promising and reliable technique to accurately match tumor lesions over consecutive timepoints, regardless of lesion size and region.
Limitations: Current work focuses on matching lesions individually, not the entire patient, which is future work.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: N/A
7 min
Comparative Analysis of Language Models for Automated Interpretation of Longitudinal TACE Reports in Hepatocellular Carcinoma
Elif Can, Freiburg Im Breisgau / Germany
Author Block: E. Can1, E. Kotter1, K. Vogt1, M. Brönnimann2, A. Elkilany3, W. Uller1, K. Bressem4, L. C. Adams4; 1Freiburg/DE, 2Bern/CH, 3Leipzig/DE, 4Munich/DE
Purpose: The increasing complexity of radiology data in hepatocellular carcinoma (HCC) requires innovative solutions to ensure consistent and efficient interpretation. This study aims to evaluate the performance of four leading language models (GPT, Gemini, Llama, and Llama405b) in extracting and interpreting key clinical data from longitudinal TACE reports. By automating this process, we seek to reduce the burden on radiologists, enhance decision-making accuracy, and improve workflow efficiency in interventional radiology.
Methods or Background: We analyzed the performance of each model on 50 anonymized TACE reports. The models were assessed for accuracy in extracting clinical data (diagnosis, BCLC staging, mRECIST assessment), as well as vascular involvement and metastases identification. A detailed error analysis was conducted to evaluate consistency and precision across tasks. Performance metrics included diagnosis accuracy, liver segment identification, and error severity.
Results or Findings: All models demonstrated high accuracy (90-100%) in basic tasks such as diagnosis identification and procedure date extraction. However, in complex tasks like mRECIST assessment and lymph node status evaluation, performance varied significantly. Gemini outperformed in segment identification (4.6/5) and vascular involvement (2.9/5), while all models struggled with mRECIST accuracy (0-10%). Llama and Llama405b exhibited slightly higher error rates (7.7, 8.0) compared to GPT (6.5) and Gemini (6.6).
Conclusion: While language models show promise in automating TACE report interpretation, challenges remain in specialized medical tasks. Gemini showed the best overall performance, but significant improvements are needed in mRECIST assessment and anatomical reporting. Further research is required to refine these models for clinical application, potentially reducing radiologists’ workload and enhancing decision-making processes.
Limitations: The models' generalizability across other forms of interventional radiology remains to be validated.
Funding for this study: No funding provided.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approved by the ethics committee of the University Medical Center Freiburg.
7 min
CT-based Foundation Model Enhanced Preoperative Prediction of Microvascular Invasion in Hepatocellular Carcinoma: a multicenter study
Lin Deng, Shanghai / China
Author Block: L. Deng, W. Xia, J. Xia, W. Dai, F. Yan, R. Li; Shanghai/CN
Purpose: To develop CT-based foundation model and multi-instance learning (MIL) framework to preoperatively predict microvascular invasion (MVI) in hepatocellular carcinoma (HCC).
Methods or Background: CT-based foundation models were developed through self-supervised learning using public CT image datasets. Patients with pathologically proven HCC were included from two centers, and the CT image sequences of non-contrast, arterial and portal venous phase were acquired. The features from slices of HCC tumor region were extracted by foundation models, and the features were aggregated by MIL to predict MVI status. The predictions of all sequences were combined to obtain a final prediction by logistics regression. The performance of proposed method was evaluated by area under the receiver operating characteristic curve (AUC).
Results or Findings: The CT image patches of 36,811 lesions from DeepLesion, LiTS and 3D-IRCADb were used for foundation model development. A total of 617 HCC patients (median age, 69 years; IQR, 52-67 years; men 510) were included and divided into training set (center 1, n=493) and independent test set (center 2, n=124). By using a few slices of HCC tumor region (median number, 3; IQR, 2-4), the proposed method achieved AUCs of 0.80, 0.78, 0.74, and 0.83 for non-contrast, arterial phase, portal venous phase, and all sequences combined, respectively. The proposed method significantly outperformed the method without foundation model (AUC=0.72, 0.67, 0.71 and 0.64 for each sequence, P<.05) and the radiomics model (AUC=0.72, P<.05).
Conclusion: By leveraging a few representative CT slices and avoiding the need for full tumor delineation, the foundation model-based method achieves improved performance compared to previous methods, demonstrating the crucial role of the foundation model for accurate MVI prediction and facilitating more efficient clinical decision-making.
Limitations: Not applicable.
Funding for this study: Not applicable.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Not applicable.
7 min
Artificial intelligence for evaluation of magnetic resonance imaging-detected extramural vascular invasion in rectal cancer
Haitao Huang, Guangzhou / China
Author Block: H. Huang, K. Zhao, Z. Liu, C. Liang; Guangzhou/CN
Purpose: Extramural vascular invasion (EMVI) is a detectable magnetic resonance imaging (MRI) marker that reflects both tumor invasive and metastatic potential. Since EMVI+ is seen as an independent indicator of worse prognosis, clinicians may pursue more intensive treatment strategies for affected patients. However, EMVI assessment was influenced by observer experience and subjective factors, limiting its practical effectiveness. This study aims to develop and validate an interpretable deep learning-based approach for automated mrEMVI identification through voxel-level segmentation, providing objective and consistent detection method.
Methods or Background: This is a multicenter study that included a total of 2,501 rectal cancer patients, with 1,830 in the training cohort and 671 in the validation cohorts. Dice similarity score was used to measure segmentation performance, while the inter-reader agreement of mrEMVI was calculated using Cohen’s Kappa (κ). The prognostic value of mrEMVI statuses identified by the artificial intelligence (AI) model was evaluated by Kaplan–Meier curves and the Cox model.
Results or Findings: Our model demonstrated excellent performance in identifying mrEMVI, achieving accuracy of 81.54% and 84.72% in the two validation cohorts. The model demonstrated a high level of inter-reader consistency with senior radiologists in identifying mrEMVI status (κ: 0.713–0.736). AI-mrEMVI+ patients have significantly shorter 3-year disease-free survival (DFS) and 5-year overall survival (OS) compared to AI-mrEMVI− patients (DFS: 62.23% vs 84.91%, HR=2.67 95% CI: 1.95–3.66, P<0.001; OS: 68.71% vs 87.14%, HR=2.64 95% CI: 1.75–3.97, P<0.001).
Conclusion: We provide a more objective and consistent approach for the detection of mrEMVI, demonstrating potential in prognostic prediction, and offering promising contributions to optimizing the treatment of rectal cancer patients.
Limitations: the model exhibited some false positives, primarily because the model mistakenly identified larger blood vessels in the mesorectal area as mrEMVI.
Funding for this study: National Science Foundation for Young Scientists of China (82202267).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study has obtained approval from the ethics review committees of all participating hospitals. Considering the retrospective design of the study, the requirement for written informed consent from patients was waived.
7 min
PROVIZ Proof-of-Technology: Performance of a machine learning software for detection of clinically significant prostate cancer on biparametric MRI in a prospective clinical study
Rebecca Segre, Trondheim / Norway
Author Block: R. Segre, M. Sunoqrot, G. Nketiah, P. Davik, S. Langorgen, M. Elschot, T. Frost Bathen; Trondheim/NO
Purpose: PROVIZ is a machine learning software designed to detect clinically significant prostate cancer (csPCa, defined as GGG > 1) on MRI as a reference for targeted biopsy. The aim of this study is to evaluate feasibility (technical issues in < 10% of processed cases), safety (absence of Serious Adverse Device Effects, SADEs), and performance of PROVIZ.
Methods or Background: Prospective, proof-of-technology study on 80 consenting men. Inclusion criteria: biopsy-naive men undergoing MRI for suspicion of prostate cancer. MR images were first delineated according to PI-RADS v. 2.1 by an experienced radiologist. Subsequently, automated detection (up to three suspicious areas) was performed using PROVIZ. Delineations of all lesions were used as a reference for targeted biopsies, providing the ground truth.
Results or Findings: To date, 73/80 patients have completed the study. Regarding feasibility, one technical issue was experienced among the participants. In terms of safety, no SADEs were observed. As to performance, PROVIZ scored a high AUROC (92.3%) and can be retrospectively tuned to reach the same patient-level sensitivity of the radiologist at PI-RADS 3 (94.6%), while gaining an improvement in specificity (72.2% vs 52.8%). At this operating point, PROVIZ would have referred 7 less patients to biopsy than the radiologist (45/73 vs 52/73). On a lesion-level, PROVIZ showed a slightly lower sensitivity than the radiologist (42.9% vs 45.5%) but fewer false positives per case (30.1% vs 41.1%).
Conclusion: Preliminary results of this prospective study indicate that PROVIZ is feasible and safe to use, with a performance for detection of csPCa comparable to an experienced radiologist. As a support tool for the radiologist, PROVIZ shows potential for reducing false positive predictions, therefore minimizing unnecessary biopsies.
Limitations: Incomplete data collection. Single-center study.
Funding for this study: Norwegian University of Science and Technology (NTNU), Research Council of Norway (Grant Number 295013), The Liaison Committee between the Central Norway Regional Health Authority and the Norwegian University of Science and Technology (Grant Numbers 983005100, 982992100 and 90368401), St. Olavs Hospital - Trondheim University Hospital, Central Norway Regional Health Authority.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: REK (Regionale komiteer for medisinsk og helsefaglig forskningsetikk) approval no. 479272.
7 min
Diagnostic performance of a fully automated AI algorithm for lesion detection and PI-RADS classification in patients with suspected prostate cancer
Hannes Engel, Freiburg Im Breisgau / Germany
Author Block: H. Engel1, A. Nedelcu1, R. Grimm2, H. Von Busch2, A. Sigle3, T. Krauß1, J. Weiß1, M. Benndorf4, B. Oerther1; 1Freiburg im Breisgau/DE, 2Forchheim/DE, 3Freiburg/DE, 4Detmold/DE
Purpose: To evaluate the diagnostic performance of a fully automated AI algorithm with lesion detection and PI-RADS classification in a cohort of consecutive patients verified by targeted and extensive systematic biopsies.
Methods or Background: This retrospective, single-centre study included consecutive patients who underwent 3T multiparametric prostate magnetic resonance imaging (MRI) performed between 05/2017 and 05/2020, followed by targeted transperineal ultrasound-fusion guided and systematic biopsy. The AI algorithm (syngo.via Prostate MR, VB60S HF01, Siemens Healthineers) was described in previous publications and is based on axial T2- and diffusion-weighted imaging sequences. The results of the AI algorithm were compared with those of human readers and the diagnostic performance was determined.
Results or Findings: The evaluation of 272 patients resulted in 436 target lesions. 135 patients (49.5%) had clinically significant prostate cancer (csPCa), 35 (12.8%) had clinically insignificant prostate cancer (ISUP=1) and 102 (37.5%) were benign. Patient-level cancer detection rates (CDRs) of csPCa for AI versus human reading were 11%/18% for PI-RADS ≤2, 24%/11% for PI-RADS 3, 54%/41% for PI-RADS 4, and 74%/92% for PI-RADS 5. The accuracy of the AI was significantly better (0.74 versus 0.63 at a threshold of PI-RADS ≥4, p <0.01). 62 patients with human reading PI-RADS ≥3 were correctly classified as true negative by AI.
Conclusion: The AI algorithm proved to be a reliable and robust tool for lesion detection and classification. Furthermore, the CDRs and distribution of PI-RADS assessment categories of the AI are consistent with the results of recent meta-analyses, indicating precise risk stratification.
Limitations: The limitations of our study are mainly its retrospective and monocentric design. Additionally, the study design based on histopathological verification implies an under-representation of negative MRI scans and a cohort that is not fully representative of the wider patient population.
Funding for this study: The licence of the AI algorithm was part of an unrestricted collaboration agreement between Siemens Healthineers and the Department of Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg. While Siemens provided technical support, the study conception and design, as well as the analysis and interpretation of the data, were conducted independently. August Sigle received research support within the Berta-Ottenstein-Programme. Other than that, the authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Approval was granted by the Ethics Committee of the University of Freiburg (No. 20-1256).