Research Presentation Session: Imaging Informatics and Artificial Intelligence

RPS 805 - Artificial intelligence in neuroimaging

February 27, 10:00 - 11:00 CET

7 min
Impact of data quality variations caused by dose and image reconstruction on AI assessment of intracranial aneurysms
Paul Jahnke, Berlin / Germany
Author Block: L. Gölz, A. Laudani, U. Genske, M. Scheel, G. Bohner, H-C. Bauknecht, S. Mutze, P. Jahnke; Berlin/DE
Purpose: To assess the performance of a commercial AI algorithm in detecting intracranial aneurysms when scan data quality variations occur due to changes in dose and image reconstruction.
Methods or Background: Consistency testing of AI performance was performed using a realistic head CT phantom designed for AI evaluation. The phantom simulated a patient with three intracranial aneurysms located in the anterior communicating artery (ACoA), middle cerebral artery (MCA), and basilar artery (BA). The phantom was repeatedly examined at 21 dose levels (0.47 to 20.09 mGy) using iterative reconstruction and filtered back projection. Aneurysm labeling by an FDA-approved and CE-marked AI solution was analyzed. In addition, five neuroradiologists rated aneurysm visiblity in all examinations.
Results or Findings: AI detection rates varied by aneurysm type, with detection rates of 74.6% for the ACoA, 92.9% for the MCA, and 2.4% for the BA aneurysm across all examinations. The AI response was inconsistent at doses below 8 mGy with iterative reconstruction and at doses below 7 mGy and above 14 mGy with filtered back projection. In contrast, readers consistently reported 100% visibility for all aneurysms at doses above 2 mGy regardless of image reconstruction.
Conclusion: AI approved for managing intracranial aneurysms shows performance issues due to variations in data quality and requires different data quality standards than neuroradiologists.
Limitations: This prospective study was limited to a single AI application, a single scanner system, and three intracranial aneurysms.
Funding for this study: This work has not received any funding.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethics committee of the Charité
7 min
Improving diagnostic precision: a deep learning system for differentiating multiple sclerosis from small vessel disease using standard non-enhanced brain MRI scans
Mehran Arab Ahmadi, Tehran / Iran
Author Block: K. Firouznia, M. Arab Ahmadi, H. Hashemi, M. Boroomand-Saboor, R. Ghavami Modegh, M. Akhlaghpasand, H. Dashti, M. Gity, M. Mohammadzadeh; Tehran/IR
Purpose: The diagnosis of Multiple Sclerosis (MS) primarily depends on clinical evaluation, bolstered by magnetic resonance imaging (MRI) interpreted by skilled radiologists.However, the typical imaging characteristics of MS can resemble those of other central nervous system disorders.One such condition is Cerebral SVD, which can complicate the radiologist's ability to make a diagnosis.This differential diagnosis can be particularly challenging in the early stages of the disease.The objective of this study is to create and assess a Computer-Aided Diagnosis (CAD) system utilizing brain MRI images to differentiate between MS and SVD.
Methods or Background: Brain MRI scans were obtained from a 3 Tesla scanner for patients diagnosed with MS during acute attacks and silent phases, alongside individuals diagnosed with SVD based on cardiovascular risk factors. MRI sequences included FLAIR,T1, and T2. An expert neuroradiologist identified white matter lesions, which were segmented using artificial intelligence software. The dataset was divided into 80% for training, 10% for validation, and 10% for testing. A neuroradiologist then evaluated the AI results against established clinical and imaging criteria.
Results or Findings: The study included 80 MS patients with 265 lesions compared to 67 SVD patients with 218 lesions. The AI tool achieved a sensitivity of 78.57% and specificity of 93.33% (P-value < 0.05). It also demonstrated a positive predictive value (PPV) of 91.67%, a negative predictive value (NPV) of 82.35%, balanced accuracy of 85.95%, and an area under the curve (AUC) of 78.71.
Conclusion: The findings suggest that artificial intelligence can effectively differentiate MRI images of MS from those of SVD using routine sequences.Implementing AI in distinguishing between MS and SVD lesions could enhance diagnostic accuracy and improve patient management in clinical practice.
Limitations: Sample volume and one center study listed as some of the limitations.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This study was approved by an institutional ethics committee.
7 min
Previously proposed radiomics features for ruptured intracranial aneurysm classification: Overview, auto-segmentation, and external validation
Dongqin Zhu, Wenzhou / China
Author Block: D. Zhu, Y. Yang; Wenzhou/CN
Purpose: To automatically segment and extract radiomics features of intracranial aneurysms (IAs), validate existing radiomics predictors for ruptured IAs, and construct machine learning (ML) and deep learning (DL) models for classifying ruptured IAs.
Methods or Background: In this retrospective study, we used data from the MIRACLE Cohort, registered with the Chinese Clinical Trial Registry (ChiCTR2400084601). IAs were segmented automatically using the DGIS method. We systematically reviewed studies reporting radiomics predictors for ruptured IAs and externally validated those predictors. We developed five ML and DL models for classifying ruptured IAs, employing the SHapley Additive exPlanations (SHAP) method to enhance model interpretability.
Results or Findings: The study included 632 patients with 668 aneurysms, divided into training (n=593) and external testing (n=75) datasets. The DGIS method achieved great segmentation accuracy with Dice coefficients of 0.98 and 0.75 in the source and target domains, respectively. When comparing radiomics features derived from manual and automatic segmentations, the original_shape_VoxelVolume, MeshVolume, and SurfaceArea showed the highest stability (with all ICC of >0.9). Upon external validation of radiomics predictors from 12 studies, the AUCs ranged from 0.59 to 0.71 in the training dataset and 0.48 to 0.65 in the external testing dataset. The original_shape_Elongation feature emerged as the most frequently utilized predictor. The Gradient Boosting and DRE models performed well in classifying ruptured IAs, with AUCs reaching 0.995 and 0.95 in the training dataset, and 0.85 and 0.80 in the external testing dataset, respectively.
Conclusion: This study presents a comprehensive workflow for automatic IAs rupture risk analysis and an overview of existing radiomics predictors. After external validation, certain original shape features demonstrated significant stability, utility, and predictive power. The ML and DL models offer a promising tool for risk stratification of IAs.
Limitations: Not applicable.
Funding for this study: This study was supported by the Wenzhou Major Program of Science and Technology Innovation (Grant No. ZY2020012) and Key Laboratory of Novel Nuclide Technologies on Precision Diagnosis and Treatment & clinical Transformation of Wenzhou City (Grant No. 2023HZSY0012).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethics Committee of the First Affiliated Hospital of Wenzhou Medical University
7 min
Diagnostic Performance of Neural Network Algorithms in Skull Fractures Detection in CT Scans: A Systematic Review and Meta-Analysis
Ramtin Hajibeygi, Tehran / Iran
Author Block: R. Hajibeygi1, G. Sharifi1, M. Fathi1, A. Bahrami2, R. Eshraghi2, I. Dixe De Oliveira Santo3, A. Mirjafari4, J. Chan4, L. Tu3; 1Tehran/IR, 2Kashan/IR, 3New Haven, CT/US, 4Los Angeles, CA/US
Purpose: The potential intricacy of skull fractures as well as the complexity of underlying anatomy poses diagnostic hurdles for radiologists evaluating CT scans. The necessity for automated diagnostic tools has been brought to light by the shortage of radiologists and the growing demand for rapid and accurate fracture diagnosis. Convolutional Neural Networks (CNNs) are a potential new class of medical imaging technologies that use deep learning (DL) to improve diagnosis accuracy. The objective of this systematic review and meta-analysis is to assess how well CNN models diagnose skull fractures on CT images.
Methods or Background: PubMed, Scopus, and Web of Science were searched for studies before February 2024 that used CNN models to detect skull fractures on CT scans. Meta-analyses were conducted for area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. Egger's and Begg's tests were used to assess publication bias.
Results or Findings: Meta-analysis was performed for 11 studies with 20798 patients. Pooled average AUC for implementing pre-training for transfer learning in CNN models within their training model’s architecture was 0.96 ± 0.02. The pooled averages of the studies' sensitivity and specificity were 1.0 and 0.93, respectively. The accuracy was obtained 0.92 ± 0.04. Studies showed heterogeneity, which was explained by differences in model topologies, training models, and validation techniques. There was no significant publication bias detected.
Conclusion: CNN models perform well in identifying skull fractures on CT scans. The results suggest that CNNs have the potential to improve diagnostic accuracy in the imaging of acute skull trauma. To further enhance these models' practical applicability, future studies could concentrate on the utility of DL models in prospective clinical trials.
Limitations: One of the limitations is lack of homogeneity in CT image quality across studies.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: None
7 min
AI as a second reader in post-traumatic head CT at Oktoberfest 2024: A prospective performance monitoring study
Maria Barbara Steinberger, Munich / Germany
Author Block: M. B. Steinberger, M. Bock, A. S. Duque, B. F. Hoppe, J. P. Rudolph, Y. Dikhtyar, P. Reidler, W. Flatz, D. Hinzmann, V. Bogner-Flatz, J. Ricke, C. C. Cyran; Munich/DE
Purpose: To prospectively assess the impact of an AI algorithm on radiologists’ diagnostic confidence in detecting intracranial haemorrhage (ICH) in post-traumatic head CT at Oktoberfest 2024.
Methods or Background: A mobile CT scanner (Somatom go.Top, Siemens Healthineers) was operated on-site for triaging patients with mild to moderate traumatic head injuries. This prospective study included n=219 patients who underwent head CT. Instant AI analysis was provided via auto-routing to a fully PACS-integrated, GDPR-compliant clinical AI platform. Initially, one of 15 board-certified radiologists, alternating in shifts, read the head CT unassisted, rating ICH likelihood on a 5-point Likert scale (-2, “very low”; 2, “very high”). After submitting this evaluation, algorithm results were made available for reassessment of ICH likelihood. Performance monitoring of the AI tool was implemented in PACS (Visage Imaging) via Fast Healthcare Interoperability Resources (FHIR) pop-up forms.
Results or Findings: AI support was utilised in 66% (146/222 scans) of the readings, varying significantly between readers (43%-100%). At a probability threshold of 0.1, the AI tool correctly identified 6 out of 7 ICH, 139 true negatives, no false positives (sens=0.857, spec=1.000, acc=0.993, ppv=1.000, npv=0.993). AI assistance increased radiologists' confidence in ruling out ICH in 19 cases (-1 to -2) and confirming ICH in two cases (1 to 2). In two borderline cases, AI aided in excluding ICH (0 to -1). Overall, diagnostic confidence was significantly higher with AI support (p<0.001).
Conclusion: AI assistance significantly improved diagnostic confidence of radiologists reading trauma head CT at Oktoberfest 2024, serving as a virtual second reader in this emergency setting. PACS-integrated FHIR forms set a framework for seamless monitoring of AI performance and its impact on diagnostic workflow.
Limitations: In 40 cases, no AI analysis was performed due to incorrect specifications or failed auto-routing.
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The ethics committee notification can be found under the number UID 24-0813. Written informed consent was obtained from all participants and the study was registered in the German Clinical Trial Register (DRKS00034969).
7 min
Impact of Defacing Procedures on Brain Age Gap Estimation
Vivien Lorena Ivan, Düsseldorf / Germany
Author Block: V. L. Ivan1, J. Caspers1, M. Vach1, D. M. Hedderich2, D. Weiß1, C. Rubbert1; 1Düsseldorf/DE, 2Munich/DE
Purpose: Removal of facial features from MRI brain scans (“Defacing”) is mandatory from data privacy perspective. We investigated the impact of defacing on Brain Age Gap Estimation (BrainAGE), an imaging biomarker used in various research areas such as atypical aging.
Methods or Background: A total of 364 Alzheimer’s disease (AD) patients and 717 cognitively normal (CN) participants were analyzed including unaccelerated (AD:n=290; CN:n=386) and accelerated 3DT1 imaging (AD:n=203;; CN:n=500). BrainAGE was computed after defacing using either afni_refacer, fsl_deface, mri_deface, mri_reface, PyDeface, or spm_deface and without defacing. For BrainAGE, gray matter features were extracted using CAT12 for SPM12. BrainAGE was calculated as predicted age minus chronological age. A subset of participants (AD:n=74, CN:n=84) had within-session repeat imaging available and were processed without defacing, serving as a benchmark for BrainAGE differences. Mean absolute error (MAE), and mean squared error (MSE) were calculated. Outliers due to defacing were identified using Grubbs’s tests.
Results or Findings: Benchmark analysis found MAE of 1.15 and MSE of 2.25 for BrainAGE differences between initial and repeat scans without defacing in CN, and an MAE of 1.43 and MSE of 3.29 for AD. Among defacing methods, PyDeface exhibited the best performance with an overall MAE of 0.33 and MSE of 0.27, showing a mean BrainAGE difference of 0.08±0.52. PyDeface also had the fewest outliers (n=99) based on the benchmark criteria. Grubbs’s test identified 23 outliers after PyDeface, with 11 found after mri_reface and 20 after spm_deface.
Conclusion: Defacing can be employed for data privacy without significantly affecting the reliability of BrainAGE as an imaging biomarker. PyDeface is recommended.
Limitations: BrainAGE may be affected by defacing, however, in most approaches this influence is lesser than the variability observed in BrainAGE in repeat non-defaced imaging.
Funding for this study: No
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: No
7 min
Identification of depression subtypes in Parkinson's disease patients via structural MRI whole-brain radiomics: an unsupervised machine learning study
Zhenyu Shu, Zhejiang / China
Author Block: Z. Shu; Zhejiang, Hangzhou/CN
Purpose: Unsupervised machine learning methods based on whole-brain radiomic analysis were used to identify subtypes of depression that occur during the progression of Parkinson's disease (PD).
Methods or Background: Data from 272 PD patients in the PPMI database were used, among which 81 experienced depression in Parkinson's Disease (DPD) during a 5-year follow-up period. Quantitative radiomic features were extracted from the whole-brain magnetic resonance structural images of each patient, and principal component analysis (PCA) was used for feature dimensionality reduction. All of the cases were classified into different subtypes by unsupervised cluster analysis (UCA). The high-risk subtypes were selected through comparative analysis. The high-risk subtype data were divided into training subgroups and testing subgroups at a 7:3 ratio. On the basis of the clinical characteristics of the training subgroups, multiple logistic regression analysis was performed to confirm the risk factors for DPD subtypes. The DPD subtypes were subsequently identified on the basis of the risk factors. A prediction model was constructed via decision trees, and the diagnostic accuracy of the model was evaluated via receiver operating characteristic (ROC) curves.
Results or Findings: Logistic regression analysis based on high-risk subtype groups revealed that rem, updrs1_score, updrs2_score, and ptau were independent predictors of DPD. The prediction model based on high-risk subgroups had AUC values of 0.853 and 0.81 in the training and testing subgroups, sensitivities of 0.765 and 0.786, and specificities of 0.771 and 0.815, respectively. The AUC, sensitivity, and specificity in the non-high-risk subgroup were 0.859, 0.654, and 0.852, respectively.
Conclusion: An UCA based on MRI structural imaging features can identify high-risk subtypes of DPD, and the constructed model can also predict the progression of DPD well.
Limitations: This study was designed as a retrospective analysis.
Funding for this study: The work was supported by the Natural Science Foundation of Zhejiang Province of China (LGF22H090021)
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The case data used in this study came from the Parkinson's Progression Markers Initiative (PPMI) (http://www.PPMI-info.org) database, and data collection was approved by institutional review board; For ethical review information on the data, please refer to the website.
7 min
A Machine-Learning Model Based on US Radiomics to Classify Benign and Malignant Thyroid Nodules
Vincenzo Dolcetti, Rome / Italy
Author Block: A. Guerrisi1, V. Dolcetti1, L. Miseo1, A. Valenti1, F. Elia1, G. Del Gaudio1, F. Raponi2, E. David2, V. Cantisani1; 1Rome/IT, 2Catania/IT
Purpose: The aim of this work was to develop a machine learning model based on thyroid ultrasound images in order to classify nodules into benign and malignant classes. Ultrasound and fine needle biopsy are the most reliable diagnostic methods to date, but they have some limitations. Radiomics and machine learning could be useful to improve diagnosis while reducing invasive procedures.
Methods or Background: Ultrasound images from 142 subjects were collected: 40 patients belonged to "malignant" and 102 to "benign" class, according to histological diagnosis (fine-needle aspiration). Those images were used to train, cross-validate and internal test three different machine learning models, using the “Trace for Research” software. A robust radiomic approach was applied, and the models (random forests, SVM and k-NN classifiers) were evaluated. Finally, the best model was externally tested on an additional cohort of 21 patients.
Results or Findings: The best model (ensemble of random forest) showed ROC-AUC (%) of 85 (majority vote), 83.7** (mean) [80.2-87.2], accuracy (%) of 83, 81.2** [77.1-85.2], sensitivity (%) of 70, 67.5** [64.3-70.7], specificity (%) of 88, 86.5** [82-91], PPV (%) of 70, 66.5** [57.9-75.1], and NPV (%) of 88, 87.1** [85.5-88.8] (*p<0.05, **p<0.005) in the internal test cohort. This model was then externally tested, achieving an Accuracy of 90.5%, a sensitivity of 100%, a specificity of 86.7%, a PPV of 75% and an NPV of 100%.
Conclusion: The best model could successfully identify all the malignant nodes and the consistent majority of benign in external testing cohort. Further investigations could be conducted by testing the model with images of nodules from different centers.
Limitations: Additional external tests should be performed, with images from different ultrasound machines and different healthcare centers to increase variability of target population.
Funding for this study: This research was supported by Italian Ministry of Health
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was performed in line with the principles 417
of the Declaration of Helsinki. Approval was granted by the Ethics Committee of IRCCS 418
IFO-Fondazione GB Bietti (Date: 23/01/2023, N: 1820/23)