Research Presentation Session: Abdominal and Gastrointestinal

RPS 1301 - AI in liver and pancreatic oncology

March 6, 09:30 - 11:00 CET

6 min
Moderator's introduction
Christoforos Stoupis, Forch / Switzerland
6 min
A Multimodal Deep Learning Model for Prediction of Early Progression in Patients with Advanced Hepatocellular carcinoma Treated with Atezolizumab-Bevacizumab
Gaia Crescimanno, Palermo / Italy
Author Block: G. Crescimanno, R. Cannella, C. Celsa, S. Contino, L. Cruciata, G. Cirrincione, R. Pirrone, G. Brancatelli, C. Cammà; Palermo/IT
Purpose: Atezolizumab-Bevacizumab is recommended as first-line treatment for advanced/unresectable hepatocellular carcinoma (HCC). However, validated clinical or radiological systems able to predict early treatment response or identify non-responsive patients at risk of early therapeutic failure are currently lacking. This study developed a multimodal AI model to predict 6-month progression-free survival (PFS).
Methods or Background: 51 patients (mean age 78.2±9.2 years, 78.4% male) with advanced/unresectable HCC treated with Atezolizumab-Bevacizumab as first-line systemic treatment between 2021 and 2024 were retrospectively included at a single tertiary referral centre. We designed a multimodal neural architecture using late fusion strategy: (1) convolutional neural network (CNN) for feature extraction from pre-treatment contrast-enhanced CT images using the arterial phase, and (2) multilayer perceptron (MLP) for conventional clinical-laboratory data. Feature vectors were concatenated and fed into a final classifier. Primary outcome was 6-month PFS. Performance metrics included accuracy, precision, recall, F1-score, and area under the curve (AUC).
Results or Findings: During median follow-up of 14.5 months, median overall survival was 23.6 months (95%CI:15.1-38.0) and median PFS was 14 months (95%CI:9.9-23.6), with 19.6% of patients experiencing radiological progression or death within 6 months. There were no Traditional statistical analysis failed to identify significant predictors of 6-month PFS by using conventional statistical analysis. The multimodal AI model demonstrated excellent performance for 6-month PFS prediction, with accuracy of 96.25%, precision 97.82%, recall 70.66%, F1-score 82.05%, and AUC 0.95.
Conclusion: Multimodal AI framework can successfully address the critical gap in early prediction of treatment failure for HCC patients on Atezolizumab-Bevacizumab. The high precision could make it a reliable clinical decision-support tool for identifying patients at risk of early therapeutic failure, improving personalized treatment.
Limitations: Single-center study with a small number of patients.
Funding for this study: No
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study has been approved.
6 min
Ensemble Deep Learning Models on Multi-Sequence MRI for Enhanced Prediction of Microvascular Invasion in Hepatocellular Carcinoma
Yifan Pan, Fuzhou / China
Author Block: Y. Pan; Fuzhou/CN
Purpose: To evaluate and compare three ensemble strategies—soft voting, hard voting, and stacking—in deep learning models integrating multi-sequence MRI data for predicting microvascular invasion (MVI) in hepatocellular carcinoma (HCC).
Methods or Background: Retrospective study included 299 postoperatively pathologically confirmed HCC patients from two centers. Six MRI sequences (T2WI, T1WI, arterial/portal venous/delayed/hepatobiliary phases) were manually annotated with bounding boxes to fully cover tumors. Six 3D ResNet-18 single-sequence models were built, with 5-fold cross-validation to ensure training robustness. The three ensemble strategies integrated model output probabilities; stacking used a support vector machine (SVM) as the meta-model for further training. Performance was assessed via receiver operating characteristic (ROC) curves.
Results or Findings: In the validation set, five-fold cross-validation showed the following average AUCs for single-sequence models: T2WI (0.685), T1WI (0.690), AP (0.689), PVP (0.712), DP (0.694), and HBP (0.666). For fusion strategies: soft voting aggregated MVI risk scores to reach AUC 0.775 and accuracy 0.721; hard voting yielded AUC 0.733 and accuracy 0.738. Among the three, stacking performed best (AUC 0.791, accuracy 0.740), outperforming the other two in integrating multi-sequence MRI for more accurate MVI prediction.
Conclusion: Multi-sequence MRI models consistently outperform single-sequence counterparts in HCC MVI prediction, as they integrate complementary info (e.g., T2WI’s soft-tissue contrast, enhanced phases’ vascular patterns) single-sequence models miss. Among ensembles, stacking excels by using an SVM meta-model to refine cross-sequence feature fusion—avoiding voting methods’ flaws—and delivers the highest performance. This supports more reliable MVI stratification, aiding clinicians in optimizing surgical plans and postoperative follow-up for precise HCC management.
Limitations: The sample size needs expansion, and further multi-center studies are required to improve the generalizability of the findings.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The data involved in this study has been approved by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University
6 min
Interpretable Hepatocellular Carcinoma Risk Stratification Model in Cirrhotic Patients: Integrating MRI-based Radiomics Deep Learning and Body Composition
Yanjin Qin, Guangzhou / China
Author Block: Y. Qin, D. Xu, X. Zhou, S-T. Feng; Guangzhou/CN
Purpose: Purpose: Hepatocellular carcinoma (HCC) poses a significant global health challenge, underscoring the critical importance of accurate risk stratification in cirrhosis, which currently remains limited. This study aims to develop and validate an interpretable risk stratification prediction (called IRSP) model that integrates MRI-based radiomics, deep learning, body composition, and clinical features to enhance early HCC prediction in cirrhotic patients.
Methods or Background: Methods: This analysis included 479 cirrhotic patients from three primary cohorts who had undergone gadoxetic acid-enhanced MRI between January 2015 and December 2020. Radiomic and deep learning features were extracted from liver regions of interest at multi-MRI sequences (unenhanced, late arterial, and hepatobiliary phase). Unenhanced MRI-quantified body composition was measured. Using features mentioned above, the IRSP model was developed in the discovery cohort (n = 302), and then validated in an internal validation cohort (n = 73), and an external validation cohort from 2 external centers (n = 104).
Results or Findings: Results: The IRSP model effectively predicted short-term HCC development in cirrhotic patients with an area under the curve (AUC) of 0.924 (95% confidence interval 0.876-0.9721) in the discovery cohort, 0.895 (0.835-0.955) in the internal validation cohort, and 0.915 (0.882-0.948) in the external validation cohort. By applying optimal thresholds of 0.31 and 0.67, the high-risk (n = 121, 16.0%) and medium-risk (n = 233, 30.7%) groups, which covered 92.6% (88/95) of the patients who developed HCC, had significantly higher rates of HCC occurrence compared to the low-risk group (n = 404, 53.3%) (17.3% vs 4.8% vs 0.31%, P < 0.001).
Conclusion: Conclusion: The novel IRSP model provides reliable estimates of HCC development for cirrhotic patients and may have the potential to improve the precision in clinical decision-making and early initiation of HCC treatments.
Limitations: Not applicable.
Funding for this study: This work was funded by the National Natural Science Foundation of China (82471948, 82271958) and the Natural Science Foundation of Guangdong Province (2024A1515012149).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Yes, this study has been reviewed and approved by the Ethics Committee of the First Affiliated Hospital of Sun Yat-Sen University , with approval number [2021]124. All research procedures strictly adhere to relevant ethical guidelines and regulations.
6 min
A CT radiomic-integrated model to predict the risk of Hepatocellular Carcinoma in Cirrhotic patients
Silvia Schirò, Parma / Italy
Author Block: S. Schirò1, L. Leo1, G. Besutti2, C. Marrocchio1, M. V. Bazzocchi1, D. Stefanelli1, E. Gjataj2, G. Missale1, N. Sverzellati1; 1Parma/IT, 2Modena/IT
Purpose: To develop and externally validate a clinico-radiomic model for predicting the risk of hepatocellular carcinoma (HCC) progress in cirrhotic patients employing non-contrast-enhanced CT scans
Methods or Background: In this retrospective bicentric study 141 cirrhotic patients were included and divided into a discovery cohort (n=98) and an independent test cohort (n=43). All subjects underwent at least one non-contrast abdominal CT prior to HCC onset (HCC cohort) or during follow-up (non-HCC cohort). The whole liver parenchyma was manually segmented from non-contrast CTs. Comprehensively 851 radiomic features were extracted and filtered for reproducibility and redundancy. A radiomics-only model (RAD) and a clinico-radiomic model (INT) integrating radiomics with clinical variables (alpha-fetoprotein, FIB-4 index, Child-Pugh score) were trained on the discovery cohort and validated on the test set.
Results or Findings: The RAD model, made of three selected features and a decision tree classifier, achieved an AUC of 0.696 in the external test set, with 0.905 sensitivity and 0.500 specificity. The INT model, incorporating RAD-score and clinical parameters via stochastic gradient descent, improved specificity to 0.681 while maintaining good sensitivity (0.809), yielding an AUC of 0.703. Decision curve analysis showed a higher net clinical benefit of both models compared to default strategies across a range of decision thresholds.
Conclusion: Radiomic features extracted from whole liver on non-contrast CT, especially when integrated with routine clinical data, can stratify the risk of HCC progress in cirrhotic patients
Limitations: Firstly, the retrospective study design may have introduced bias on subjects’ selection. Secondly, the size of the study population was small, although this was partly compensated by using an external validation cohort to test model performance. Further validation on larger and prospective cohorts is required to address these limitations and confirm the clinical utility of our model.
Funding for this study: No funding was received for this study
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This retrospective bicentric study has been approved by the ethics committee (protocol number 930/2022/OSS/UNIPR)
6 min
Deep learning–accelerated Dixon MRI enables rapid and reliable liver fat quantification
Stephan Rau, Freiburg Im Breisgau / Germany
Author Block: S. Rau1, A. Fink1, R. Strecker2, D. Nickel2, L. Michel1, D. I. Klemm1, F. Bamberg1, J. Weiß1, M. Russe1; 1Freiburg Im Breisgau/DE, 2Erlangen/DE
Purpose: To assess the accuracy and clinical feasibility of deep learning–accelerated T1-weighted VIBE Dixon sequences for whole-liver fat-signal fraction (FSF) quantification compared with standard protocols.
Methods or Background: In this prospective single-centre study, 60 patients (mean age 64 years, 55% female) underwent abdominal MRI at 1.5 T, including a standard VIBE Dixon sequence and two accelerated protocols (“fast” and “ultra-fast”) reconstructed with deep learning. Acquisition times were 15, 10 and 6 seconds, respectively. Whole-liver FSF was calculated using a validated automated convolutional neural network–based segmentation. Agreement between accelerated and standard sequences was evaluated using mean absolute error (MAE) and Spearman’s correlation.
Results or Findings: Liver volumes obtained from accelerated protocols showed excellent correlation with the standard sequence (ρ = 0.975–0.988, p < 0.001). Median liver fat fractions were 2.25% (standard), 2.61% (fast), and 2.35% (ultra-fast). The MAE from the standard was 0.57% for fast and 0.52% for ultra-fast. Correlations for FSF remained high (ρ = 0.923–0.936, p < 0.001), with no systematic bias across protocols.
Conclusion: Deep learning–accelerated Dixon MRI allows reliable and fully automated liver fat quantification with substantial reduction of breath-hold duration, supporting its use as an interchangeable alternative to standard protocols.
Limitations: The limitations of the study are its single-vendor setting and use of a two-point Dixon reference rather than a multi-echo or histological gold standard.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethics Committee of the University Medical Center Freiburg, case number 22-1185
6 min
Radiomics of Hepatocellular Carcinoma: Identifying Predictors of Microvascular Invasion Using Multi-Phase CT
Caterina Vitale, Verona / Italy
Author Block: F. Spoto, N. Cardobi, C. Vitale, B. Mascarin, L. Ordofendi, F. Apolloni, R. De Robertis Lombardi, M. D'Onofrio; Verona/IT
Purpose: To explore radiomic texture features from multi-phase contrast-enhanced CT as potential predictors of microvascular invasion (MVI) in hepatocellular carcinoma (HCC).
Methods or Background: This exploratory single-center study retrospectively analyzed 49 patients (54 HCC lesions) who underwent liver resection between 2018-2022. Radiomic analysis extracted 642 features across arterial, venous, and delayed phases using original and 5mm-expanded tumor margins.
Results or Findings: The 20-50mm lesion subgroup (n=37) provided the most reliable results, with arterial phase texture homogeneity features achieving AUC 0.772. Features from lesions <20mm (n=14, 4 MVI+) showed clear evidence of overfitting and were excluded from primary analyses. Delayed phase features showed preliminary associations (AUC 0.8) in a small LR-3/4 subset (n=20).
Conclusion: Multi-phase CT radiomic analysis shows potential for MVI prediction in intermediate-sized HCC lesions, though external validation in larger cohorts is essential before clinical application.
Limitations: This hypothesis-generating study has significant limitations including small sample size, single-center design, and lack of correction for multiple comparisons.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Reducing Breath-Hold Time in Liver MRI: Clinical Performance of Deep Learning-Accelerated Post-Contrast T1 VIBE
Stephan Rau, Freiburg Im Breisgau / Germany
Author Block: S. Rau1, A. Fink1, V. Sacalean1, K. Kästingschäfer1, R. Strecker2, D. Nickel2, F. Bamberg1, J. Weiß1, M. Russe1; 1Freiburg Im Breisgau/DE, 2Erlangen/DE
Purpose: To assess whether deep learning-accelerated post-contrast liver MRI can shorten breath-holds while maintaining diagnostic image quality.
Methods or Background: In this prospective study, ninety-nine patients (mean age 61.0 ± 15.4 years; 49.5% female) underwent three T1-weighted two-point Dixon gradient-echo sequences on a 1.5 T system: a standard protocol (18 seconds) and two deep learning-accelerated protocols (10 and 6 seconds). Three blinded radiologists rated overall image quality, motion artefacts, other artefacts, anatomical differentiability, and lesion conspicuity on five-point Likert scales. Per-patient consensus was the median across readers. Global differences were tested with the Friedman test followed by Holm-adjusted Wilcoxon signed-rank tests. Non-inferiority of diagnostic acceptability (Likert score at least 3) for accelerated versus standard sequences was tested with a non-inferiority margin of minus five percentage points.
Results or Findings: The standard sequence yielded higher ratings for anatomical differentiability (median 5 vs 4 and 4) and lesion conspicuity (5 vs 4 and 4; both p<0.001) and slightly higher overall image quality (4 vs 4 and 4; p<0.001). Motion-artefact ratings did not differ across sequences. Diagnostic acceptability met non-inferiority for both accelerated sequences across all items.
Conclusion: Deep learning reconstruction enabled substantial acceleration of post-contrast liver MRI, reducing breath-holds by 44% (10 seconds) and 67% (6 seconds) without loss of diagnostic acceptability. Faster, motion-robust acquisitions may benefit patients with limited breath-hold capacity.
Limitations: Single-centre study with minor image-quality trade-offs relative to the standard sequence.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethics commitee of the University Medical Center Freiburg; Nr. 22-1185.
6 min
Novel methods using dynamic-gadoxetate enhanced MRI to identify inhibitors of drug metabolism by the liver
Sam Jenkins, Leeds / United Kingdom
Author Block: S. Jenkins, O. Spear, E. Checkley, S. Sourbron, B. Rea; Sheffield/UK
Purpose: To demonstrate that dynamic gadoxetate enhanced MRI (DGE-MRI) can distinguish inhibitors from non-inhibitors of liver transporter function by measuring the effect of ciclosporin (known inhibitor) and metformin (no evidence of inhibition) on hepatocellular gadoxetate uptake and excretion.
Methods or Background: The identification of drugs at risk of drug-drug interactions (DDI) early in the drug development life cycle is key to avoid late stage drug development failures. DGE-MRI can potentially detect liver-mediated DDI’s by assessing drug effects on liver transporter function, but it is currently unclear whether it is sufficiently sensitive to distinguish levels of inhibition.

This prospective study recruited 12 healthy volunteers, split evenly between ciclosporin and metformin groups. Each participant underwent two visits and at each visit underwent two contrast-enhanced (1/4 dose of gadoxetate) MRI liver scans one-hour apart. The second scan ensured biliary excretion could be accurately evaluated. At Visit B, a one-off clinical dose of metformin or ciclosporin was administered prior to imaging. Liver uptake and excretion rates were derived by MRI signal modelling.
Results or Findings: Ciclopsorin reduced hepatocellular uptake by 67% (p<0.001, 95%CI 61-72) and biliary excretion rate by 50% (p=0.027), although more variable (95%CI 21-80). Metformin did not affect hepatocellular uptake rate (average -7.3%, p=0.25, 95%CI -2.3-17) and or biliary excretion of gadoxetate (average 11%, p=0.35, 95%CI -8.2-31).
Conclusion: DGE can distinguish weak from strong inhibition of uptake and excretion. In future, the method may be of use in drug safety assessment to help predict DDI risk.
Limitations: Further assessment is required to develop DGE-MRI as a biomarker, including administration of drugs specific to individual transporters and testing in patients with impaired liver function.
Funding for this study: The research leading to these results received funding from the Innovative Medicines Initiatives 2 Joint Undertaking under grant agreement No 116106. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: HRA and Health and Care Research Wales (HRCW) Approval.
6 min
MRI-Based Deep Learning Quantification of Liver 3D-PDFF and Upper Abdominal Composition for Prognostic Prediction in HCC Treated with TACE plus Systemic Therapy
Shuwei Zhou, Nanjing / China
Author Block: S. Zhou1, Y-C. Wang1, Y. Song2; 1Nanjing/CN, 2Shanghai/CN
Purpose: To investigate the prognostic value of MRI-derived three-dimensional proton density fat fraction (3D-PDFF) and upper abdominal composition parameters for survival prediction in hepatocellular carcinoma (HCC) patients receiving transarterial chemoembolization (TACE) combined with systemic therapy.
Methods or Background: This retrospective single-center study (December 2022–December 2024) included consecutive BCLC stage B-C HCC patients who underwent TACE plus systemic therapy with pre-treatment contrast-enhanced MRI. Fully automated deep learning-based volumetric analysis quantified liver 3D-PDFF, spleen and liver volumes, and skeletal muscle, visceral adipose tissue (VAT), and subcutaneous adipose tissue areas at L1-L2 vertebral levels. Continuous variables were Z-score normalized for hazard ratio comparability. Cox regression analysis identified independent overall survival predictors.
Results or Findings: A total of 125 patients (median age, 62 years; 107 men) were analyzed. Univariate analysis showed that high 3D-PDFF (> 5%), increased spleen volume, and reduced skeletal muscle and adipose tissue areas at L1 and L2 were significantly associated with shorter OS (all P < 0.1). In multivariate analysis, 3D-PDFF (HR: 2.23; 95% CI: 1.19–4.18), spleen volume (HR: 1.32; 95% CI: 1.03–1.70), and VAT area at L1 (HR: 0.66; 95% CI: 0.50–0.89) remained independent prognostic factors.
Conclusion: MRI-derived 3D-PDFF, spleen volume, and L1 VAT area are independent imaging biomarkers for predicting survival in HCC patients treated with TACE plus systemic therapy. The use of fully automated, AI-based analysis further highlights the potential for integration of these imaging metrics into clinical workflows.
Limitations: This study utilized L1-L2 level measurements instead of conventional L3-based body composition analysis, as standard upper abdominal MRI protocols typically do not include L3 levels. While this limits direct comparison with L3-based literature, our approach enhances clinical practicality and real-world applicability for routine implementation.
Funding for this study: NSFC, No. 82271978
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by Institutional Review Board/Ethics Committee (Identifier: 2017ZDSYLL022-P01)
6 min
Next generation spectral AI-based reconstruction for abdominal spectral detector dual energy CT: superior quality for liver imaging
Lukas Fortmann, Cologne / Germany
Author Block: L. Fortmann1, J. Lueckel1, L. Hieronymi1, S. Skornitzke2, D. Maintz1, N. Große Hokamp1; 1Cologne/DE, 2Hamburg/DE
Purpose: Spectral detector DECT (sdDECT) offers advantages in liver imaging, like virtual monoenergetic images. A new prototype deep learning–based spectral reconstruction algorithm (SAI, Philips) was evaluated for conventional reconstructions (SAI) and virtual monoenergetic images (SAI-VMI) to optimize reconstruction settings for abdominal image quality compared to existing fully-iterative reconstruction (FI-R) and hybrid-iterative reconstruction (HI-R/HI-VMI).
Methods or Background: For 20 patients undergoing abdominal sdDECT, conventional images were reconstructed with FI-R, HI-R, and five SAI settings: SAI-Sharper, SAI-Sharp, SAI-Standard, SAI-Smooth, and SAI-Smother. For 55keV-VMI, we compared HI-VMI with the five SAI-VMI settings. Quantitative analysis with eight liver ROIs included signal-to-noise (SNR) and contrast-to-noise ratio (CNR). Image quality was assessed by two radiologists using a two-alternative forced-choice design.
Results or Findings: For conventional images, mean attenuation was comparable between FI-R, HI-R, and SAI (103.31HU18.93HU;p>.05). Noise was lowest for FI-R and SAI-Smoother (4.12HU0.85HU and 5.05HU1.11HU;p≤.05 vs. remaining; HI-R:15.08HU3.91HU). FI-R achieved highest SNR (25.726.64;p≤.05) and CNR (51.917.59;p≤.05) followed by SAI-Smoother (SNR: 21.135.79, CNR: 43.828.19; both p≤.05 vs. remaining; HI-R SNR:7.232.33 and CNR:15.293.31). Regarding subjective quality, radiologists showed a significant preference for SAI-Smoother (81.67%9.60%) and SAI-Smooth (75.83%7.60%) compared to FI-R (65.00%13.13%), HI-R (19.58%8.67%), and other reconstructions (p≤.05). For 55keV-VMI, there was no significant difference in mean attenuation between HI-VMI and SAI-VMI (146.94HU28.14HU). Noise was significantly lower with SAI-VMI-Smoother (4.97HU2.68HU, HI-VMI:12.68HU3.27HU;p≤.05), resulting in significantly higher SNR (30.407.04, HI-VMI:12.103.88;p≤.05) and CNR (55.1610.43, HI-VMI:23.354.92;p≤.05). Radiologists significantly preferred SAI-VMI-Smooth (81.50%7.45%) and SAI-VMI-Smoother (81.00%15.53%) compared to HI-VMI (38.50%7.45%) and other reconstructions (p≤.05).
Conclusion: For conventional sdDECT, novel SAI-Smooth and SAI-Smoother are preferred by radiologists, despite having lower quantitative SNR and CNR than FI-R. Additionally, SAI-VMI-Smooth and SAI-VMI-Smoother also yield better quantitative and qualitative results than HI-VMI.
Limitations: The study is limited by its retrospective design.
Funding for this study: This work was funded by Philips Healthcare. The funding source had no involvement in study design, collection or interpretation of data.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: IRB-approved
6 min
CECT-Based Deep Learning for Pancreatic Lesion Diagnosis and Three-Tier Management: Multicenter Development, External Validation, and Reader Study
Yueyi Zhang, Shanghai / China
Author Block: C. Ma, Y. Zhang, L. Lin, K. Zhang, N. Zhang, K. Cao; Shanghai/CN
Purpose: To develop and validate a contrast-enhanced CT (CECT)–based deep learning model for accurate differential diagnosis and three-tier clinical management of pancreatic lesions.
Methods or Background: Retrospective cross-sectional study of 7,748 cases across 10 categories from a tertiary center (2015-2023). Ten categories included: pancreatic ductal adenocarcinoma, pancreatic neuroendocrine tumors, solid pseudopapillary neoplasms, intraductal papillary mucinous neoplasms(IPMN), mucinous cystic neoplasms(MCN), serous cystadenomas, periampullary carcinomas(PAC), chronic pancreatitis, acute pancreatitis, and normal pancreas (confirmed by two-year follow-up). Model training used 6343 cases, with 1405 internal and 2361 external tests. A hybrid CNN–Transformer addressed performed 10-class diagnosis, dysplasia grading in IPMN/MCN, and three-tier management (discharge/surveillance/intervention). Performance was evaluated using AUC, Top-1 accuracy, and balanced accuracy (BA). A 12-radiologist reader study assessed assistive value.
Results or Findings: Internal test set (ten-class model): sensitivity 97.5% (95% CI, 96.4–98.3), specificity 99.6% (95% CI, 98.5–100), AUC 99.8% (95% CI, 99.7–99.9), Top-1 accuracy 88.8% (95% CI, 87.0–90.3), and BA 83.0% (95% CI, 79.7–86.1). The model improved mean diagnostic accuracy versus original radiology reports by 7.42% (79.9% vs 72.4%; 95% CI, 1.5–13.7; p=0.004). Dysplasia grading: AUC 84.8% (95% CI, 78.5–90.4), BA 76.8% (95% CI, 70.6–83.1). Clinical management: BA 84.3% (95% CI, 77.2–88.6). External test set: BA 72.4% (95% CI, 69.5–75.2) for diagnosis, 66.6% (95% CI, 60.2–72.9) for dysplasia grading, and 80.0% (95% CI, 77.1–82.7) for clinical management. In the reader study, AI assistance increased specificity by 9.8% (94.4% vs 84.7%; p=0.0003), diagnostic BA by 10.8% (75.1% vs 64.4%; p<0.001), and clinical management BA by 6.9% (70.2% vs 63.2%; p<0.001).
Conclusion: A CECT-based deep learning model achieves high diagnostic performance and significantly enhances clinical management decisions for pancreatic lesions, including in multi-center evaluation and reader-assisted settings.
Limitations: No prospective validation, some benign lesions excluded, PAC not subtyped.
Funding for this study: 1.National Natural Science Foundation of China, No.82372045
2.Shanghai Natural Science Foundation, No.23ZR1478400
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Shanghai Changhai Hospital: CHEC2022-069
6 min
Deep Learning-based segmentation of pancreatic neuroendocrine neoplasms: development and validation of a hierarchical model
Francesco Prato, Milan / Italy
Author Block: A. Belardo, M. Fois, E. Guccinelli, F. Prato, L. Tonelli, D. Palumbo, M. G. Ubeira-Gabellini, C. Fiorino, F. De Cobelli; Milan/IT
Purpose: This study aims to train and validate a neural network for pancreatic neuroendocrine neoplasms (PanNENs) segmentation. The enrolled patients belong to a retrospective cohort used in a previous study where baseline radiomic analyses were performed [https://doi.org/10.1007/s00330-022-09351-9].
Methods or Background: The training dataset consisted of 107 patients’ CTs of patients who underwent surgery for PanNENs between January 2015 and December 2021. For each patient, the previously segmented ROI was transferred onto the arterial phase images. The volume’s median value was 5.71cc (IQR=[1.63, 17.55]cc). First, the segmentation of the pancreas was performed using TotalSegmentator. All propagated segmentations were reviewed and verified by an experienced radiologist. Model training was then performed using a nnUNet region class order strategy, in which the network learns to segment anatomical structures following a hierarchical order to improve consistency across labels, restricting the search to the segmented pancreas. The test phase was conducted on 30 additional PanNEN patients who underwent surgery between February 2017 and March 2025. The volume’s median value on this cohort was 5.29cc (IQR=[3.87, 8.56]cc).
Results or Findings: For 10 out of 107 patients, no prediction was obtained due to failure of the model in finding reasonable contours. For the remaining 97 patients, the model achieved an average Dice value of 0.85 (IQR=[0.84, 0.92]). On the test population, the average Dice score was 0.60 (IQR=[0.54, 0.78]). Given the prevalently small volumes, results indicate good segmentation accuracy, although the significantly worse performance in the test cohort needs further investigation.
Conclusion: AI-based segmentation of PanNEN is feasible with performances consistent with inter-observer variability of manual segmentation [https://doi.org/10.1016/j.ejmp.2018.12.005].
Limitations: In cases of unclear contours (about 10% of cases), position at the pancreas borders or poor enhancement, the model fails in recognizing PanNEN presence or position.
Funding for this study: Not available now.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: