Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 205 - Improving diagnosis and prognosis through AI in CNS diseases

March 4, 10:00 - 11:00 CET

6 min
Assessment of the stability of intracranial aneurysms using a deep learning model based on computed tomography angiography
Lu Zeng, Chongqing / China
Author Block: L. Zeng; Chongqing/CN
Purpose: The aim of this study was to construct a deep learning model (DLM) to identify unstable aneurysms on computed tomography angiography (CTA) images.
Methods or Background: The clinical data of 1041 patients with 1227 aneurysms were retrospectively analyzed from August 2011 to May 2021. Patients with aneurysms were divided into unstable (ruptured, evolving and symptomatic aneurysms) and stable (fortuitous, nonevolving and asymptomatic aneurysms) groups and randomly divided into training (833 patients with 991 aneurysms) and internal validation (208 patients with 236 aneurysms) sets. One hundred and ninety-seven patients with 229 aneurysms from another hospital were included in the external validation set. Six models based on a convolutional neural network (CNN) or logistic regression were constructed on the basis of clinical, morphological and deep learning (DL) features. The area under the curve (AUC), accuracy, sensitivity and specifcity were calculated to evaluate the discriminating ability of the models.
Results or Findings: The AUCs of Models A (clinical), B (morphological) and C (DL features from the CTA image) in the external validation set were 0.5706, 0.9665 and 0.8453, respectively. The AUCs of Model D (clinical and DL features), Model E (clinical and morphological features) and Model F (clinical, morphological and DL features) in the external validation set were 0.8395, 0.9597 and 0.9696, respectively.
Conclusion: The CNN-based DLM, which integrates clinical, morphological and DL features, outperforms other models in predicting IA stability. The DLM has the potential to assess IA stability and support clinical decision-making.
Limitations: The maximum slices of aneurysm images were used to construct 2D models, and some important factors that may affect aneurysm stability may have been missed, possibly leading to bias. More advanced DL algorithms are needed.
Funding for this study: This study was supported by the Science and Technology Commission of Chongqing City, China (CSTB2023NSCQ-MSX0668), the Joint Project of Science and Health of Chongqing City, China (2023MSXM022) and the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202200407).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The local ethics committee (Banan Hospital, 2021015; Xinqiao Hospital, 202248201) approved this retrospective study and agreed to the informed consent waiver of patients.
6 min
Glioblastoma Response Prediction and Tumor and Organ-at-Risk Segmentation with Radiomics and Deep Learning
Alejandro Mora Rubio, Valencia / Spain
Author Block: A. Mora Rubio1, C. Bravo Vergara1, M. Beser-Robles1, G. Ribas1, P. Garcia Verdu1, I. Popp2, A. L. Grosu2, L. Marti-Bonmati1, M. Carles Fariña1; 1Valencia/ES, 2Freiburg/DE
Purpose: The current standard treatment for Glioblastoma Multiforme (GBM) involves radiation therapy (RT) and requires manual tumour segmentation, which is labour-intensive and susceptible to inter-observer variability. Additionally, given the high recurrence rate of GBM patients, accurate response prediction methods can help to improve patient prognosis stratification and optimize treatment plans. The aim of this study is to develop and evaluate automatic segmentation methods based on deep learning and assess the ability of mathematical models employing clinical and radiomics features (RF) to improve the accuracy of response prediction.
Methods or Background: The study included 253 patients with primary/recurrent GBM, prospectively (185) and a retrospectively recruited in 13 institutions. The open-source cohort BraTS2021 of primary glioma patients was also used. The nnU-Net open-source framework was used to develop models for Enhancing Tumour, Peritumoral Edema, RT Planning Target Volume and six Organs-at-Risk, based on segmentation performed by neuroradiologist and radiation oncologist. For response prediction in recurrent GBM of the prospective cohort, the Cox Proportional Hazard and Logistic Regression models for overall survival, time to progression, and early recurrence prediction, were applied.
Results or Findings: The segmentation models achieved good to excellent performance, with average DSC scores in the test set of 0.79 (0.69-0.98). Clinical and MR-RF showed significant discrimination between patients with early and late progression on validation and test sets (p < 0.05 in Kaplan-Meier curves). Wavelet transform RF and clinical features like age, methylation status, and tumour localisation were notably significant predictors.
Conclusion: The good performance of the automatic segmentation models supports their use in clinical workflows to simplify procedures, reduce time investment, and increase robustness. Radiomics analysis suggested that the MR-RF model has potential for predicting time to progression.
Limitations: Ongoing work is about the FET-PET complementary information.
Funding for this study: This work was supported by the MATTO-GBM Project under the European TRANSCAN-3 ERA-NET 2022 Program, funded by ISCIII (AC23_1/00012), FAECC (TRANSCAN2022-784-104), and the European Union through the Next Generation EU Funds. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: All patients gave written informed consent according to institutional and federal guidelines. All study protocols were approved by the corresponding ethics committee.
6 min
Diagnostic accuracy of a commercial AI tool for intracranial large- and medium-vessel occlusion detection in a multicenter emergency-CTA cohort
Henrik Andersson, Lund / Sweden
Author Block: H. Andersson, B. Hansen, J. Wassélius; Lund/SE
Purpose: To evaluate the diagnostic accuracy of an AI tool (AIDOC VO), the first commercial AI tool designed to detect both large (LVO) and medium vessel occlusions (MeVO) on head‑and‑neck CT angiography (CTA), in a region-wide multicenter emergency setting.
Methods or Background: Prospective diagnostic‑accuracy study of consecutive emergency CTAs from 3 031 adults (mean age 67 years; 51 % women), acquired 1 March - 8 July 2024 across a ten hospitals healthcare region. AI analysed each scan; the routine radiology report served as comparator. Examinations flagged positive or doubtful by either test underwent rereading by interventional neuroradiologists to establish the reference standard. Sensitivity, specificity, predictive value, and accuracy were calculated for the primary VO analysis, with prespecified sub‑analyses for LVO and MeVO. Paired differences were tested with McNemar’s test.
Results or Findings: Of 3 031 CTAs, 2 804 (92.5 %) yielded valid AI output, among which VO was identified in 224 (8 %) examinations. VO sensitivity/specificity were 81.7%/99.6% for AI versus 81.2%/99.3% for the clinical radiology report (p = 0.91/p = 0.12). LVO sensitivity was 92.8% versus 87.0% (p=0.42); MeVO 76.1% versus 79.2% (p=0.55). Paired overall accuracy showed no significant differences (VO p=0.38; LVO p=0.06; MeVO p=0.76). AI identified VO in 42 examinations missed by radiologists (18.8% enhanced detection; 15 per 1,000) and generated 11 false alerts (3.9 per 1,000).
Conclusion: Stand-alone AI matched radiologist performance, with a similar number of enhanced detections and few false alerts, supporting complementary use in emergency stroke workflows.
Limitations: About 7.5% of CTAs failed quality control, only positive/doubtful cases were reread so some false negatives may have been missed, AI heat-map localization was not verified, and effects of AI assistance on readers were not assessed.
Funding for this study: Funding was provided by the Crafoord Foundation, VINNOVA, and SUS Stiftelser & Fonder; sponsors had no role in the study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The Swedish Ethical Review Authority approved the study (#2023-00387-01) and waived informed consent.
6 min
Development of predictive models to identify the intracranial aneurysm responsible for subarachnoid hemorrhage in patients with multiple saccular aneurysms
Guangxian Wang, Chongqing / China
Author Block: G. Wang; Chongqing/CN
Purpose: To develop and test machine learning (ML) models using CTA to identify the intracranial aneurysm (IA) responsible for subarachnoid hemorrhage (SAH) accurately in patients with multiple saccular IAs and to determine whether these models outperform traditional predictive markers.
Methods or Background: 207 SAH patients with 460 IAs from four hospitals were included and randomly divided into training (80%) and internal validation (20%) sets. Additionally, an external validation set comprising 65 patients with 147 IAs from other four hospitals was used. The predictive models were developed using ML methods that integrated the morphological features of IAs (e.g., size and shape) to identify the responsible IA. These models were then compared with traditional predictive markers that relies on hemorrhage patterns and the maximum IA size.
Results or Findings: The areas under the curves (AUCs) for the hemorrhage patterns and the maximum IA size were 0.496–0.505, 0.502–0.523, and 0.488–0.498 in the training, internal validation, and external validation sets, respectively. Among the 13 ML models, the best-performing models were the Gaussian process, logistic regression, and quadratic discriminant analysis models, with AUCs of 0.912, 0.894, and 0.890, respectively, for the training set; 0.869, 0.872, and 0.853, respectively, for the internal validation set; and 0.898, 0.892, and 0.897, respectively, for the external validation set. DeLong tests revealed no significant differences among these models, but all the models outperformed traditional predictive markers (P<0.001).
Conclusion: ML models that integrate multiple morphological features can predict the IA responsible for SAH accurately in patients with multiple IAs. These models outperform traditional predictive markers in identifying the responsible IA, thereby facilitating prompt and effective treatment.
Limitations: The results may not be applicable for predicting the rupture risk of unruptured IAs or for patients with a single IA.
Funding for this study: This study was supported by the Science and Technology Commission of Chongqing City, China (CSTB2023NSCQ-MSX0668) and the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202200407).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This multicenter study received approval from the ethics committees of our hospitals. Since this was a retrospective study, informed consent from patients was not needed.
6 min
Explainable machine learning model based on pretreatment MRI radiomic features for differentiating brain metastasis from glioblastoma
Xiaoping Yi, Chongqing / China
Author Block: X. Yi1, B. T. Chen2; 1Chongqing/CN, 2Duarte, CA/US
Purpose: Machine learning prediction models of brain metastasis (BM) and glioblastoma (GBM) are mostly not explainable, which hinders their clinical application. This study aimed to develop an explainable machine learning model utilizing brain magnetic resonance imaging (MRI) radiomics and clinical data.
Methods or Background: This retrospective study consisted of 596 patients from two independent institutions. Clinical demographic information and MRI data were collected. A three-stage feature selection process (Lasso + Boruta + multicollinearity removal) was employed to identify the most significant predictors from the clinical and radiomic datasets. An explainable Extra Tree model was constructed, and the SHAP (SHapley Additive exPlanations) method was utilized to enhance model explanation.
Results or Findings: Twelve features derived from clinical and radiomic data were selected to build the explainable machine learning Extra Tree model. The model achieved area under the curve (AUC) values of 0.9804, 0.9733, and 0.9542 in the training, internal validation, and independent external validation cohorts, respectively. The Extra Tree model exhibited superior classification performance compared to ten other promising models in both internal and external validation cohorts. The explainable Extra Tree model, integrating clinical data, T1 radiomic features and T2 radiomic features, outperformed single-feature models (e.g., clinical-only, T1-only, or T2-only model) as well as pairwise combined models (e.g., clinical+T1, clinical+T2, or T1+T2 model). The Extra Tree model enhanced explanation through four key steps: local explanation, global explanation, feature effect analysis, and sensitivity analysis.
Conclusion: The explainable machine learning Extra Tree model effectively and accurately distinguished BM from GBM, which should help to potentially implement it in clinical use and assist in clinical decision-making.
Limitations: The retrospective nature of this study may lead to inevitable case selection bias and our study was limited with potential overfitting issues.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This multicenter retrospective study was approved by the institutional review board
6 min
Diffusion-based deep learning model for automated detection of focal cortical dysplasia using multi-modality PET-MRI
Maha Alshammari, London / United Kingdom
Author Block: M. Alshammari1, M. Yakubu1, E. Guedj2, N. Girard2, J. Cardoso1, A. Hammers1; 1London/UK, 2Marseille/FR
Purpose: Co-registration of FDG PET and MRI has been shown to improve detectability of epileptogenic lesions in focal cortical dysplasia (FCD). In previous work, we generated pseudo-normal PET using a diffusion model, which achieved high sensitivity but resulted in an average of 22 false positives per case. Here, we integrate multi-modality data (co-registered PET and MRI) for the generation of pseudo-normal PET.
Methods or Background: A weakly-supervised 2D-UNet-based diffusion model with dual input channels (co-registered PET and MRI) was trained on 280 slices from 35 healthy controls, with synthetic FCD lesions in 140 slices (50%). The model was trained using 5-fold cross-validation, yielding 5 independent models. Each test case was processed through all 5 cross-validated models at 2 different noise levels with 5 samples each, generating 50 pseudo-normal reconstructions. Deviations between original and pseudo-normal images were quantified as voxel-wise robust Z-scores, enhanced using Probabilistic Threshold-Free Cluster Enhancement (pTFCE), and thresholded at FWER-corrected value. Performance was evaluated on 10 independent synthetically lesioned test cases.
Results or Findings: The multi-modality model achieved 80% sensitivity (8/10 lesions detected) with 2.0±2.72 false-positive detections per case. This represents an 11-fold improvement in specificity compared to our previous PET-only approach while maintaining comparable detection sensitivity.
Conclusion: The multi-modality diffusion model successfully detects FCD lesions with dramatically improved specificity. The reduction from 22 to 2 false positives per case demonstrates significant clinical potential for automated epileptogenic zone localisation in presurgical epilepsy evaluation.
Limitations: Validation was performed on synthetically lesioned cases only. Further evaluation with clinical FCD cases and larger diverse datasets is required to establish robust real-world performance.
Funding for this study: Funding for this study: The School of Biomedical Engineering and Imaging Sciences is supported by the Wellcome EPSRC Centre for Medical Engineering at King’s College London (WT 203148/Z/16/Z) and the Department of Health via the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre award to Guy’s & St Thomas’ NHS Foundation Trust in partnership with King’s College London and King’s College Hospital NHS Foundation Trust. MA is supported by the Saudi Arabia Cultural Bureau in London under the Saudi scholarship program.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Unmasking Bias in BrainAGE: The Role of Scanner Manufacturer, Field Strength and Sequence Parameters
Vivien Lorena Ivan, Düsseldorf / Germany
Author Block: V. L. Ivan1, M. Vach1, J. Caspers1, D. Weiß1, D. M. Hedderich2, C. Rubbert1; 1Düsseldorf/DE, 2Munich/DE
Purpose: Brain Age Gap Estimation (BrainAGE) has emerged as a promising biomarker of brain health and disease. Yet, systematic bias may arise not only from study design but also from technical factors such as scanner manufacturer, field strength, or sequence parameters. This study aimed to quantify their influence on BrainAGE in healthy subjects.
Methods or Background: We included 2,414 cognitively normal participants from four population-based studies (ADNI, HCPA, OASIS3, PPMI). BrainAGE was computed using a standardized pipeline (CAT12/SPM12 preprocessing, PCA, Gaussian process regression) and a model trained on 2,953 independent controls. Differences in BrainAGE were assessed across cohorts, scanner manufacturers, field strengths (1.5T vs 3T), and T1-sequence acceleration using Welch’s t-tests or ANOVA with Tukey post-hoc tests. Mean absolute error (MAE) was also calculated.
Results or Findings: Mean BrainAGE differed significantly between cohorts (ADNI: –5.9±5.5 years, HCPA: –4.1±6.2, OASIS3: –4.8±5.4, PPMI: –3.0±5.8; p<0.0001). Field strength had a strong impact: 1.5T scans yielded smaller BrainAGE (–2.2±4.8 years, MAE ~4.2) compared to 3T scans (–5.1±5.6, MAE ~6.7; p<0.05 across cohorts). Scanner manufacturer also mattered: in ADNI, Philips scanners produced a significantly smaller BrainAGE (–4.9±5.1, MAE 6.0) compared to GE (–6.3±6.2, MAE 7.3; p=0.008), while in PPMI Siemens differed from GE (p=0.008). By contrast, accelerated vs. unaccelerated 3T T1 acquisitions showed no significant effect (p=0.47).
Conclusion: Scanner manufacturer and field strength systematically bias BrainAGE in healthy cohorts, while sequence acceleration does not. These findings highlight that BrainAGE is not scanner-independent and emphasize caution when comparing results across studies or pooling multi-cohort data.
Limitations: The retrospective use of heterogeneous cohort data with only limited scanner parameter detail may confound effects of acquisition protocols with cohort-specific factors
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Positive ethical approval
6 min
Brain metabolic imaging with 18F-PET-CT and machine-learning clustering analysis reveal divergent metabolic phenotypes in patients with amyotrophic lateral sclerosis
Xiaoping Yi, Chongqing / China
Author Block: X. Yi1, B. T. Chen2; 1Chongqing/CN, 2Duarte, CA/US
Purpose: This study aimed to identify distinct ALS phenotypes by integrating brain 18F-fluorodeoxyglucose positron emission tomography-computed tomography (18F-FDG PET-CT) metabolic imaging with consensus clustering data.
Methods or Background: Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by significant clinicopathologic heterogeneity. This study prospectively enrolled 127 patients with ALS and 128 healthy controls. All participants underwent a brain 18F-FDG-PET-CT metabolic imaging, psychological questionnaires, and functional screening. K-means consensus clustering was applied to define neuroimaging-based phenotypes. Survival analyses were also performed. Whole exome sequencing (WES) was utilized to detect ALS-related genetic mutations, followed by GO/KEGG pathway enrichment and imaging-transcriptome analysis based on the brain metabolic activity on the 18F-FDG-PET-CT imaging.
Results or Findings: Consensus clustering identified two metabolic phenotypes, i.e., the metabolic attenuation phenotype and the metabolic non-attenuation phenotype according to their glucose metabolic activity pattern. The metabolic attenuation phenotype was associated with worse survival (p = 0.022), poorer physical function (p = 0.005), more severe depression (p = 0.026) and greater anxiety level (p = 0.05). WES testing and neuroimaging-transcriptome analysis identified specific gene mutations and molecular pathways with each phenotype.
Conclusion: We identified two distinct ALS phenotypes with varying clinicopathologic features, indicating that the unsupervised machine learning applied to PET imaging may effectively classify metabolic subtypes of ALS. These findings contributed novel insights into the heterogeneous pathophysiology of ALS, which should inform personalized therapeutic strategies for patients with ALS.
Limitations: The sample size was relatively small, and this was a cross-sectional study without follow-up neuroimaging data to assess brain fuctional changes over time and their correlations with clinical outcome.
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Ethics Committee and Institutional Review Board of Xiangya Hospital, and written informed consent was obtained from all study participants.
6 min
Synergistic A.I.: Unique Approach of Dual Algorithm Integration Transforms Trauma Imaging Workflow
Carolien Margot Toxopeus, Amsterdam / Netherlands
Author Block: C. M. Toxopeus, D. Duyndam, M. Gorzeman, A. Driessen; Amsterdam/NL
Purpose: Increasing CT scan demand, particularly brain and cervical spine trauma imaging, places significant pressure on emergency departments (ED). OLVG Hospital began evaluating algorithms from AIDOC for these scans in 2021. Previous studies in our hospital demonstrated that AIDOC implementation significantly reduced trauma patient throughput times for CT brain in context of trauma. Since these scans are often combined with CT cervical spine, we investigated the potential for further ED throughput time reduction through a cervical spine fracture algorithm. Algorithm accuracy was thoroughly tested before release to emergency physicians.
Methods or Background: We analyzed 2,564 cases (June 2022 - June 2023) to assess algorithm accuracy. Training modules were developed for emergency physicians. Given lower sensitivity for cervical fractures in severe arthrosis, algorithm use was restricted to patients <65 years, excluding high-energy trauma or neurological deficits. Emergency physicians could consult radiologists for excluded categories. During pilot testing (December 2023 - April 2024), emergency physicians used the cervical algorithm during shifts.
Results or Findings: The accuracy study yielded a positive predictive value >50% and negative predictive value >99.45% and deemed safe within the context of our Level II trauma center. Emergency physician evaluation of cervical spine algorithm use was consistently positive, leading to definitive integration into our collaborative ED workflow.
Conclusion: This study demonstrates that combined application of AIDOC algorithms for CT brain and cervical spine in a Level II trauma center, together with emergency physician training: 1) is safe, 2) results in significant workload reduction for radiologists during shifts, 3) increases efficiency of patient flow in the ED, and 4) has a positive effect on collaboration between radiologists and clinicians.
Limitations: This study was conducted at a Level II trauma center, which may limit generalizability to other trauma centers.
Funding for this study: Innovation Fund of OLVG Hospital.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: