Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 1505 - Mining the image: AI-driven diagnosis and prognosis in abdominal imaging

March 6, 14:00 - 15:30 CET

6 min
Automated CT Body Composition Analysis for Perioperative Risk Stratification
Judith Kohnke, Essen / Germany
Author Block: J. Kohnke, K. A. Borys, C. Bojahr, Y. Wen, S. Warmer, C. S. Schmidt, F. Nensa, R. Hosch; Essen/DE
Purpose: To investigate whether body composition parameters derived from CT scans are associated with 90-day survival in patients undergoing surgery within 14 days before or after imaging.
Methods or Background: Body composition has emerged as a potential predictor of perioperative risk. Sarcopenia, myosteatosis, and cachexia may indicate vulnerability to poor postoperative outcomes.
Thorax and Abdominal CT scans of 1575 cancer patients (844 female, 731 male) who underwent surgery within ±14 days of imaging were retrospectively analyzed, and an automated in-house body composition analysis was performed. Two groups were classified based on 90-day survival (1333 survived, 242 deceased). Differences in sarcopenia, myosteatosis, and cachexia (abdomen and thorax) between survivors and non-survivors were assessed using the Mann–Whitney U test. Correlations between body composition parameters and survival were examined using Spearman’s rank correlation.
Results or Findings: Mann–Whitney U test revealed significant differences between 90-day survivors and non-survivors for sarcopenia (abdomen: p<0.001; thorax: p<0.001) and myosteatosis (abdomen: p<0.001; thorax: p<0.001), but not for cachexia (abdomen: p=0.203; thorax: p=0.072). Spearman correlation analysis showed positive correlations of sarcopenia with mortality (abdomen: r=0.283, p<0.001; thorax: r=0.295, p<0.001), negative correlations of myosteatosis with mortality (abdomen: ρ=-r.331, p<0.001; thorax: r=-0.273, p<0.001), and no significant correlations for cachexia (abdomen: r=0.009, p=0.728; thorax: r=0.025, p=0.314).
Conclusion: Sarcopenia and myosteatosis are significantly associated with 90-day postoperative survival, suggesting that automated CT-based body composition analysis provides valuable prognostic information to aid perioperative risk stratification.
Limitations: This is a single-center study. The inclusion was limited to cancer patients, and potential confounding factors were not fully accounted for.
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Informed consent was waived by the ethics committee due to the retrospective setting.
6 min
LymphoGenAI: A regulatory-compliant, interoperable AI platform for multimodal lymph node staging in oncology
Ruben Eduardo Pacios Blanco, Madrid / Spain
Author Block: R. E. Pacios Blanco, J. Vega, C. Martínez Cabezali, L. A. Herrera Galvez; Madrid/ES
Purpose: Lymph node status is a critical determinant of prognosis and treatment planning in oncology. Current diagnostic pathways rely on invasive procedures and fragmented data interpretation, creating variability and delays. LymphoGenAI addresses this gap by developing a multimodal, generative AI-based decision support system (AI-DSS) for predicting lymph node involvement in breast and head & neck cancers, fully aligned with European digital health and AI regulatory frameworks.
Methods or Background: The consortium platform integrates radiological imaging (CT, MRI, PET), digital pathology, structured clinical data and unstructured reports into a harmonized, GDPR-compliant dataset across multiple European hospitals. Built on interoperability standards (DICOM, HL7 FHIR, SNOMED), the system employs advanced vision–language architectures and radiomics pipelines to deliver explainable predictions. Development follows EU Medical Device Regulation (MDR 2017/745) and incorporates transparency, bias monitoring and human oversight in line with the EU AI Act. Clinical validation will be conducted through retrospective and prospective multicenter studies.
Results or Findings: The project is expected to deliver a CE-mark-ready AI-DSS that enables accurate, non-invasive lymph node staging, reducing the need for unnecessary biopsies and surgical procedures. By integrating multimodal data and explainable outputs, the system will improve diagnostic consistency across institutions and accelerate time-to-treatment. Its modular, interoperable architecture will allow seamless integration into diverse hospital IT environments and ensure compliance with the European Health Data Space framework. Additionally, it will provide a long-term resource for research in oncology AI.
Conclusion: By embedding regulatory compliance, interoperability and explainability from inception, LymphoGenAI sets a benchmark for trustworthy AI in oncology. Its modular design supports scalability to other cancer types, reinforcing Europe’s leadership in ethical, human-centric AI for health.
Limitations: Compliance with evolving regulatory frameworks such as GDPR, MDR and the EU AI Act introduces complexity and may delay deployment.
Funding for this study: Presented to the Horizon Europe Programme – HLTH-2025-01-CARE-01, GenAI4EU.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Reproducibility of Manual, Semi-automated, and Fully Automated L3 CT Body-Composition Metrics in Metastatic Renal Cell Carcinoma
Francesco Farioli, San Cesario Sul Panaro / Italy
Author Block: F. Farioli, L. Casarini, F. Fiocchi, M. Dominici, R. Sabbatini, G. Ligabue, P. Torricelli, C. Baldessari; Modena/IT
Purpose: To compare the reproducibility of three CT-based body-composition workflows—manual HU thresholding, semi-automated assisted thresholding, and fully automated open-source segmentation—for skeletal muscle and adipose compartments at L3 in patients with metastatic renal cell carcinoma (mRCC).
Methods or Background: We retrospectively included 68 patients with mRCC who underwent non-contrast abdominal CT at diagnosis. At the L3 level, cross-sectional area and mean attenuation (Hounsfield units, HU) were obtained for skeletal muscle (MT), subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and intramuscular adipose tissue (IMAT). The manual reference applied fixed HU thresholds: adipose −190 to −30 HU; muscle −29 to +150 HU. Semi-automated software generated threshold suggestions with manual slice selection and contour adjustment (areas only). A fully automated open-source pipeline yielded areas and densities for all compartments. Agreement was assessed using two-way mixed-effects ICC(3,1) with 95% confidence intervals; density ICCs were computed only for manual vs automated methods (the semi-automated workflow lacked densities).
Results or Findings: Area agreement was excellent for SAT (ICC 0.944; 95% CI 0.918–0.963) and VAT (0.945; 0.920–0.964), and good for muscle (0.847; 0.783–0.897). For densities, reproducibility was moderate for muscle (0.574; 0.391–0.714) and SAT (0.811; 0.710–0.879), but poor for VAT (0.328; 0.099–0.524). IMAT showed low agreement for area (0.319; 0.089–0.517) and density (0.103; −0.138–0.331).
Conclusion: Across methods, SAT area, VAT area, and muscle area show high reproducibility, supporting interchangeability in research pipelines and potential clinical use. Density metrics are less robust—particularly for VAT and IMAT—and may require manual verification or method-specific calibration (including explicit threshold specifications) in semi-automated and fully automated pipelines. IMAT remains challenging for automated quantification and warrants algorithmic refinement.
Limitations: Single-center retrospective design; non-contrast CT; single-slice L3; semi-automated workflow lacked densities; modest sample; variability in slice selection and software parameters; no outcomes correlation.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: SIRER ID 7486 - Studio CO-CO-RE
6 min
Computed tomography-based Delta radiomics model for predicting response to antiangiogenic therapy in colorectal liver metastases
Long Yuan, Lanzhou / China
Author Block: L. Yuan, J. Zhou; Lanzhou/CN
Purpose: To develop a delta radiomics model based on computed tomography (CT) for predicting the efficacy of bevacizumab in patients with CRLM.
Methods or Background: This multicenter retrospective study included 90 patients with CRLM and 255 liver metastases treated with bevacizumab. Center 1 was allocated to training and internal validation, and centers 2 and 3 to external validation. The initial texture features of liver metastases were extracted from pre- and post-treatment CT images, and temporal texture features (Ratio, Delta, and Delta ABS) were calculated. Based on statistically significant clinical and texture features, eight models were constructed using a logistic regression classifier. The performance of eight models was comprehensively evaluated according to the predictive efficacy of therapeutic response to liver metastases and patient efficacy.
Results or Findings: Among the eight models, the Ratio, Delta, and COMB models demonstrated the highest predictive performance for therapeutic response to liver metastases and patient efficacy. In the training cohort, AUC values for predicting liver metastasis treatment response and patient efficacy ranged from 0.858 to 0.956 and 0.806 to 0.963, respectively. In the internal validation cohort, the ranges were 0.891–0.899 and 1.000. In the external validation cohort, they were 0.833–0.891 and 0.858–0.896.
Conclusion: The Ratio, Delta, and COMB models based on clinical and pre- and post-treatment CT texture feature for predicting the efficacy of bevacizumab therapy in patients with CRLM.
Limitations: First, although designed as a multicenter retrospective analysis, the sample size remained relatively modest. Future investigations should expand cohort sizes to enhance the prediction performance. Second, manual segmentation of liver metastasis introduces potential variability; implementing semi-automated or fully automated segmentation methods would improve efficiency and reproducibility.
Funding for this study: This work was supported by grants of the National Natural Science Foundation of China (No. 82371914) and the Cuiying Scientific and Technological Innovation Program of Lanzhou University Second Hospital (CY2021-ZD-01).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the three institutional ethical review board (2023A-379, 2022-254, and P-SL-202219).
6 min
Multi-task Deep Learning Model to Predict Grade and Lymph Node Metastasis in non-functional pancreatic neuroendocrine tumors(NF-PNETs)
Wei Tang, Shanghai / China
Author Block: W. Tang; Shanghai/CN
Purpose: To develop and validate a multi-modality deep learning pipeline that first achieves accurate automated lesion segmentation, then integrates imaging features and clinical parameters to predict tumor grade and LNM status in NF-PNETs.
Methods or Background: Accurate preoperative assessment of tumor grade and lymph node metastasis (LNM) in NF-PNETs is critical for optimizing surgical strategies and patient outcomes.This multicenter retrospective prognostic study analyzed 931 patients with pathologically confirmed NF-PNETs who underwent preoperative CE-CT between October 2010 and December 2023. Data were collected from 3 tertiary medical centers and divided into internal training (n=425), testing (n=141), and validation (n=174) cohorts, plus two external cohorts (Beijing, n=84; Guangzhou, n=107). Main Outcomes and Measures Primary outcomes were under the receiver operating characteristic curve (AUC) for tumor grade and lymph node metastasis (LNM) predictions. The Dice coefficient quantified segmentation accuracy. Predictive performance was compared against established clinical guidelines and single-modality models.
Results or Findings: In this multi-center retrospective study of 931 NF-PNET cases (740 internal; 191 external), the deep learning pipeline achieved superior automated segmentation (Dice coefficient, 0.74) and, building upon this foundation, delivered robustness accross internal and external cohorts on Grading and LNM prediction. The areas under the receiver operating characteristic curve ranged from 0.809 in internal sets to 0.820 and 0.779 in two external sets for grade classification task. For LNM prediction, model ranged from 0.892 in internal sets to 0.710 and 0.872 in two external sets.
Conclusion: This study demonstrates that a CE-CT-based deep learning pipeline provides robust, generalizable performance for noninvasive prediction of NF-PNET grade and nodal involvement. This imaging biomarker could enable personalized surgical planning by identifying patients suitable for organ-preserving procedures versus those requiring extended lymph adenectomy, potentially improving surgical outcomes while reducing unnecessary morbidity.
Limitations: No
Funding for this study: No
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Fudan University Shanghai Cancer Center
6 min
Age- and Sex-Specific Trajectories of Muscle and Fat Depots: A CT Study of 25,092 Adults
Philipp Reschke, Frankfurt / Germany
Author Block: P. Reschke, K. Eichler, T. Vogl, L. D. Grünewald; Frankfurt/DE
Purpose: Age- and sex-related changes in muscle and adipose tissue are key determinants of frailty, sarcopenia, and metabolic disease, but large-scale CT-based reference data are scarce.
Methods or Background: We retrospectively analyzed CT scans of 25,092 adults (8,949 females, 16,143 males; 18–100 years). Automated segmentation quantified skeletal muscle percentage, volumes of major muscle groups, intramuscular fat (IMF), muscle attenuation, and adipose depots (total [TAT], subcutaneous [SAT], visceral [VAT], epicardial [EAT], paracardial [PAT]) at thoracic and abdominal levels. Nonlinear regression characterized age- and sex-specific trajectories.
Results or Findings: Skeletal muscle percentage was higher in males than females (thorax: 31.3% vs. 25.9%; abdomen: 30.1% vs. 26.1%; p < 0.001) and declined faster in the thorax. Annual losses were −0.49%/year (thorax) and −0.31%/year (abdomen) in males versus −0.27%/year and −0.13%/year in females (p < 0.001). Decline peaked earlier and steeper in males (24–38 years) than in females (18–46 years). Muscle volumes were consistently larger in males, including iliopsoas (669 vs. 412 mL), back muscles (978 vs. 676 mL), and gluteus maximus (1,086 vs. 832 mL). IMF was higher in females (p < 0.001) and increased with age. Muscle attenuation declined significantly, with quality best preserved in the iliopsoas and poorest in autochthonous muscles. Females had significantly higher SAT and TAT (p < 0.001), while males had significantly higher VAT, EAT, and PAT (p < 0.001). VAT and especially PAT showed the steepest relative increases with age.
Conclusion: This large-scale CT study defines normative age- and sex-specific trajectories of muscle quantity, quality, and fat depots, supporting opportunistic screening, sarcopenia phenotyping, and individualized risk stratification.
Limitations: Lack of lifestyle and ethnicity information.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the ethics committee.
6 min
Deep learning-based pancreatic age estimation from MRI predicts incident diabetes beyond known risk factors in the general population
Matthias Jung, Freiburg Im Breisgau / Germany
Author Block: M. Jung, R. T. Schirrmeister, M. Reisert, S. Rospleszcz, Z. Berkarda, C. L. Schlett, F. Bamberg, J. Weiß; Freiburg Im Breisgau/DE
Purpose: Individuals and organs age at different rates. Chronological age poorly reflects this variability, whereas organ-specific biological age measures could improve risk assessment. We propose a deep learning (DL) framework (MRI-PancAge) for estimating pancreas age from MRI and investigate its value for predicting incident diabetes.
Methods or Background: MRI-PancAge was developed using data from 30,389 individuals (20-75 years) from the German National Cohort. A pancreas segmentation model on abdominal MRI was followed by a second model that takes the 3D-pancreas-segmentation-mask as input and outputs MRI-PancAge in years. MRI-PancAge was converted to a z-score, where z-score>0 refers to older and z-score<0 to younger MRI-PancAge compared to chronological age. Validation was performed in 33,559 UK Biobank (UKB) participants free of diabetes at imaging. Primary outcome was incident diabetes. Associations between MRI-PancAge categories (younger:z<-1; reference:-1–1; older:z>1) and incident diabetes were tested using Cox regression adjusted for age, sex, BMI, race, pancreas volume and fat fraction, hypertension, alcohol consumption, and smoking status. Incremental predictive value of MRI-PancAge was assessed with Harrell’s C-index.
Results or Findings: Among 33,559 UKB participants (64.8±7.7 years, 50.9% female; 1.9% incident diabetes; median follow-up 4.7 years), diabetes incidence was higher in older and lower in younger MRI-PancAge (log-rank p<0.001). Multivariable Cox regression revealed an independent association between older MRI-PancAge and incident diabetes (aHR: 1.35, 95%CI[1.10-1.65], p=0.005) after adjustment for all covariates. Adding MRI-PancAge to a baseline model with traditional risk factors showed a small but significantly improved discrimination for incident diabetes (C-index 0.780 to 0.781, p=0.009).
Conclusion: DL can estimate pancreas age from MRI and predict incident diabetes in the general population beyond chronological age, pancreas volume and fat fraction, and cardiometabolic risk factors.
Limitations: Results may not be generalizable to non-Whites or people outside the United Kingdom.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Informed consent was obtained from all participants in the UK Biobank and the German National Cohort study. In addition, we received local IRB approval (IRB of the University of Freiburg: 23-1316-S1-retro and 24-1099-S1-retro).
6 min
Volumetric vs. Slice-Based CT Body Composition Analysis: Systematic Discordance and Prognostic Impact
Katarzyna Anna Borys, Essen / Germany
Author Block: K. A. Borys, K. Arzideh, J. Kohnke, Y. Wen, C. Bojahr, L. Umutlu, J. Haubold, F. Nensa, R. Hosch; Essen/DE
Purpose: To investigate sex-specific and ECOG status-stratified differences in CT-derived body composition analysis (BCA) and to assess concordance between volumetric and slice-based BCA measures.
Methods or Background: Baseline whole-body CTs from 23,685 cancer patients (44% female) were automatically segmented to quantify skeletal muscle and adipose compartments both volumetrically (thoracic and abdominal regions) and slice-based (mid-slice at L3 and T4 levels). Prognostic value for overall survival was evaluated using Kaplan-Meier and Accelerated Failure Time models, stratified by sex, ECOG performance status, and metastatic status. Concordance between volumetric and slice-based BCA measures was assessed with Cohen’s kappa.
Results or Findings: Volumetric muscle index (MI) provided substantially greater prognostic separation than slice-based measures, with a median survival difference of 94 months between lowest and higher tertiles in females (47 vs. 141 months) and 69 months in males (27 vs. 96), compared with ≤56 months for slice-based muscle area (MA). Subcutaneous adipose tissue also showed stronger volumetric effects, while visceral fat offered limited prognostic value. Increasing physical impairment (ECOG) was associated with muscle and fat loss. Accelerated Failure Time models confirmed consistently higher time ratios (TR) for volumetric vs. slice-based measures (abdominal MI TR=1.89, P<0.001 vs. abdominal MA TR=1.41, P<0.001 in males). Concordance analyses revealed substantial disagreement (~35%) between volumetric and slice-based muscle classifications of high- and low-risk patients, indicating systematic discordance.
Conclusion: Volumetric CT analysis provides superior prognostic value compared to slice-based markers, particularly across ECOG strata, highlighting its potential for individualized risk stratification.
Limitations: Despite adjusting for key clinical factors, unmeasured confounders such as treatment regimens, comorbidities, and nutritional habits were not assessed. Moreover, the results require external validation.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The approval for this retrospective study was obtained by the Ethics Committee of the University Hospital Essen (approval number 21-10204-BO). Due to the study's retrospective nature, the requirement of written informed consent was waived by the Ethics Committee. All data were fully anonymized.
6 min
Algorithmic Fairness in Radiology: A Practical Deep Dive into AI Challenges
Bahram Mohajer, Philadelphia / United States
Author Block: B. Mohajer1, A. Zain2, H. Zhang2, Z. Hu3, R. Ball4, J. Gichoya5, A. E. Flanders1, M. Ghassemi2, E. Colak3; 1Philadelphia, PA/US, 2Boston, MA/US, 3Toronto, ON/CA, 4Bar Harbor, ME/US, 5Atlanta, GA/US
Purpose: Machine learning (ML) models have demonstrated expert-level performance in radiology; however, concerns about fairness persist, potentially reinforcing healthcare inequities. AI competitions have advanced the field by providing the most diverse publicly available datasets and hosting AI challenges. However, even in these controlled settings, concerns remain about the fairness of ML models. This study assessed fairness in top-performing ML models from the Radiological Society of North America (RSNA) Cervical Spine Fracture and Abdominal Trauma Detection AI Challenges, focusing on demographic performance differences.
Methods or Background: Predictions from the 9 top-performing models were evaluated using private test sets stratified by age groups, sex, and geographical location. Performance metrics–including false positive rate (FPR), false negative rate (FNR), area under the receiver operating characteristic curve (AUC), and expected calibration error (ECE)–were compared across subgroups.
Results or Findings: The study included 788 participants from the Cervical Spine (64% male, mean-age 54.8 years) and 709 participants from the Abdominal Trauma (69% male, mean age 48.7 years) challenges. No significant AUC or FNR differences were observed across subgroups or between sexes. However, age- and region-specific FPR disparities emerged. For cervical spine fractures, older adults (≥61 years) had higher FPRs (9.7% vs. 2.6%, p<0.05). In abdominal trauma detection, older adults also showed elevated FPRs (11.6%, p=0.003). Geographic variation was notable–Asian patients had higher FPRs (28.0%), while Oceanian patients had lower rates (5.6%, p<0.05).
Conclusion: Despite being trained on the most diverse datasets available, subgroup-specific differences in FPR–particularly across age groups–persisted. These findings highlight that even diverse training data may not entirely eliminate disparities. Continued efforts to improve demographic representation and integrate fairness-aware approaches into ML development are essential.
Limitations: Limited subgroup sample sizes—especially from Africa and South America—may affect robustness of fairness estimates and limit generalizability.
Funding for this study: This study received no direct funding. In preparation of the dataset and AI challenges, funding was recieved from Radiological Society of North America (RSNA).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Approved by
6 min
Preoperative Radiomics of Fat Stranding Enhancing the 2-Year Recurrence Prediction in Colorectal Cancer Patients
Joanna Zofia Urbaniec-Stompor, Olsztyn / Poland
Author Block: J. Z. Urbaniec-Stompor1, B. K. Budai2, C. Zerbato3, V. Damerell2, S. Hardikar4, C. Kahlert2, C. Ulrich4, H-U. Kauczor2, B. Gigic2; 1Olsztyn/PL, 2Heidelberg/DE, 3Padua/IT, 4Utah, UT/US
Purpose: Preoperative computed tomography (CT) in colorectal cancer (CRC) is used for TNM staging at the initial diagnosis. Our study aimed to identify CT-based radiomics features of the tumor and peritumoral fat tissue that could be independent predictors of recurrence.
Methods or Background: This single-center analysis of a prospective cohort included 273 patients with venous phase preoperative CT scans from the Heidelberg site of the ColoCare Study recruited between 2010 and 2024. Single-slice tumor segmentation was performed, followed by the segmentation of peritumoral fat and radiomics feature extraction. Patients were divided into training (n=148) and test (n=126) datasets. Clinical and combined models were built to predict 2-year recurrence. Logistic regression was used for radiomics feature selection and model building. Receiver operating characteristic curve (ROC) analysis with area under the curve (AUC), likelihood ratio tests (LRT), Net Reclassification Index (NRI), and Brier scores were used for evaluation.
Results or Findings: This study evaluated 274 patients (96 female and 178 males, aged 62.64 ±11.78). TNM classification and T-stage were used as proxy labels guiding radiomics feature selection. The “GLCM-Autocorrelation” texture feature of peritumoral fat was a significant independent predictor (OR: 0.23 [0.07 - 0.88, p=0.031) of 2-year recurrence in the adjusted regression model. The combined model, including tumor and peritumoral fat radiomics features, outperformed the clinical models with an AUC, accuracy, and specificity of 0.912, 81.7%, and 80.8%, respectively, at a sensitivity of 85.2%. NRI of 0.111 and Brier score 0.074 vs. 0.119 (p < 0.001) also confirmed superiority
Conclusion: Radiomics features of peritumoral fat tissue significantly improve the prediction of 2-year disease recurrence when combined with conventional clinical and pathological factors.
Limitations: Single-center study design.
Funding for this study: This study was supported by National Institutes of Health (NIH)/National Cancer Institute (NCI) grants (U01206110, R01 CA189184, R01 CA207371, R01 CA211705, T32 HG008962, KL2TR002539, K07 CA222060, R03AG067994), German Federal Ministry of Education and Research (BMBF) project PerMiCCion (01KD2101D), and Stiftung LebensBlicke. B.K.B. was supported by the Medical Data Scientist Program of the Medical Faculty of Heidelberg University.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Institutional Ethics Board (approval number: S-134/2016).
6 min
Validation of AI-Assisted RECIST Lesion Measurements in Follow-Up CT: A Multi-Center Reader Study
Alessa Hering, Nijmegen / Netherlands
Author Block: M. J. J. De Grauw1, R. Weber1, M. Westphal2, T. Lossau2, J. H. Moltz2, B. Van Ginneken1, M. Prokop1, E. J. Smit1, A. Hering1; 1Nijmegen/NL, 2Bremen/DE
Purpose: To evaluate the effect of AI assistance on reading time, inter-reader variability, and RECIST 1.1 response in follow-up CT.
Methods or Background: In a retrospective, multi-center reader study, 23 readers (15 radiologists, 8
residents) evaluated follow-up chest–abdomen–pelvis CT of 212 oncology patients with in total 539 lesions under three conditions: unassisted, AI-assisted, and expert-assisted (using a prior radiologist’s measurement as a strong reference condition). To prevent bias, readers were informed that all support originated from AI. Reading time and inter-observer measurement variability were assessed, and outcomes were analyzed with a Bayesian generalized linear mixed model.
Results or Findings: AI assistance significantly reduced reading time per patient compared to unassisted reading (–35.96 s; 95% CI: –52.95, –22.07). At the lesion level, inter-reader variability relative to the consensus standard increased slightly with AI assistance (1.32 mm; 95% CI: 0.83, 1.91). In contrast, at the clinically relevant patient level, variability decreased substantially: differences in the Sum of Longest Diameters (SLD) of more than 20% between radiologists were observed in 43.4% of unassisted cases, 28.3% of AI-assisted cases, and 17.4% of expert-assisted cases. These effects were consistent across experience levels.
Conclusion: AI assistance reduces patient-level inter-observer variability in RECIST measurements, leading to fewer discrepancies in predicted clinical response, despite a slight increase in lesion-level variability. Combined with the significantly reduced reading time, AI systems have the potential to standardize follow-up assessments, minimize discordant treatment decisions, and increase efficiency in clinical practice. The expert-assisted condition illustrates that even higher consistency is achievable, highlighting both the progress made and potential for further refinement of AI tools.
Limitations: The use of national-level data may limit the generalizability of these findings to international populations. Further evaluation across imaging vendors and more diverse patient cohorts is warranted.
Funding for this study: This research is funded by the Dutch Research Council (NWO) under grant number Veni-21121 for Applied and Engineering Sciences (Spotting the Differences: AI-based change detection in medical images).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Dual-Habitat Multimodal Imaging with 3D Deep Learning for Predicting Lung Adenocarcinoma Invasiveness: A Radiogenomic Analysis
XiaoYan Han, Xian City / China
Author Block: X. Han1, L. XIAO2, C. Zhang1, C. Han1; 1Xian City/CN, 2Shanghai/CN
Purpose: To develop a novel deep learning framework integrating CT-defined anatomical and PET/CT-defined functional habitats to predict lung adenocarcinoma invasiveness and explore radiogenomic correlations.
Methods or Background: This retrospective study included 292 patients with separate preoperative diagnostic CT and 18F-FDG PET/CT scans, which were spatially aligned using deep learning-based deformable image registration. A dual-habitat analysis was then performed: tumors were partitioned into distinct anatomical habitats via two-stage clustering on CT data, and into functional habitats using PET/CT data. A multi-stream 3D residual network with a multi-stage intermediate fusion architecture was developed to integrate all imaging and resulting habitat maps. The model was trained to predict invasive (IAC) vs. minimally invasive (MIA) status, with performance evaluated by AUC and external validation on the TCIA TCGA-LUAD cohort.
Results or Findings: The analysis identified four distinct anatomical habitats (e.g., solid-core, ground-glass) and three functional habitats (e.g., metabolically active, necrotic). The dual-habitat deep learning model achieved significantly superior performance (Internal Validation AUC: 0.92) compared to single-habitat (AUC: 0.90) and whole-tumor models (AUC: 0.88). External validation on 69 TCIA cases yielded a robust AUC of 0.85, demonstrating strong generalizability. Explainable AI (XAI) maps visually confirmed the model's focus on the critical interface between high-density anatomical and metabolically active functional habitats. Furthermore, the volume of the metabolically active habitat correlated significantly with PD-L1 expression (p<0.05), establishing a key radiogenomic link.
Conclusion: Integrating anatomical and functional tumor habitats via a multi-stream deep learning model provides a comprehensive characterization of tumor heterogeneity. This dual-habitat approach significantly improves non-invasive prediction of lung adenocarcinoma invasiveness, offering a promising pathway toward a virtual "imaging biopsy".
Limitations: The limited sample size and single data source at the data level may affect the model's generalization ability and the promotion of research results.
Funding for this study: Not applicable
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: