Research Presentation Session: Artificial Intelligence & Machine Learning & Imaging Informatics

RPS 2305 - Cutting edge advances in AI for radiological imaging

March 3, 09:30 - 11:00 CET

7 min
Automatic detection of bone fragility in radiographic images using deep-learning with multicentre cohort datasets
Guillaume Gatineau, Lausanne / Switzerland
Author Block: G. Gatineau1, G. Nguyen2, M. De Gruttola2, K. Hind2, M. Kuzma3, J. Payer3, G. Guglielmi4, A. Fahrleitner-Pammer5, D. Hans1; 1Lausanne/CH, 2Geneva/CH, 3Bratislava/SK, 4Foggia/IT, 5Graz/AT
Purpose: This study aimed to assess the accuracy and efficacy of an novel AI-driven radiographic processing tool designed to opportunistically identify individuals predisposed to very high bone fragility risk, addressing a prevailing clinical challenge in the field.
Methods or Background: From four multinational cohorts, 4,764 paired lumbar-spine X-ray DICOM and DXA scans (GE and Hologic systems) were acquired within 6 months. A total of 3,369 cases from three cohorts were allocated for training and validation of a new AI-bone fragility detection tool (Medimaps Group, Switzerland). Three hundred cases were designated as internal test set. Two hundred and seventy-one cases from the fourth cohort acted as an external test set. Very high fracture risk was defined using DXA parameters as the ground truth: BMD T-score≤-2.5 and a trabecular bone score (TBS)<1.23.
Results or Findings: The mean age and BMI of the sample (5.2% male) were 66.1±10.8 y and 26.4±5.0 kg/m2 respectively. Using the combination of DXA-derived BMD and TBS, 17.5% were identified at very high fracture risk. Uncertainties were obtained with a 95% confidence interval (CI) using binomial distribution approximations. The accuracy of the AI tool for the internal test set, was 0.85 (95% CI: 0.76-0.94), specificity 0.91 (0.8-0.99), and sensitivity 0.69 (0.53-0.84). For external validation, the accuracy, specificity, and sensitivity were 0.80 (0.69-0.87), 0.88 (0.77-0.99), and 0.62 (0.47-0.77) respectively.
Conclusion: This AI-enhanced radiographic tool exhibits potential in accurately detecting individuals at very high risk of bone fragility. Its robust specificity underscores its capacity to reduce false-positive rates, emphasising its clinical utility for efficient patient screening.
Limitations: While this study demonstrates promise, further development and validation will be beneficial, using larger and more diverse samples.
Funding for this study: This study was funded by the Fond National Suisse 32473B_156978 and 320030_188886.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was ethically approved by the WMA declaration of Helsinki. Ethical Principles For Medical Research Involving Human Subjects
7 min
Spectral CT fingerprinting: AI-based clustering of tissue signatures: proof of concept
Nils Große Hokamp, Cologne / Germany
Author Block: N. Große Hokamp, M. M. Thakur, D. Maintz, M. Schöneck, L. Caldeira; Cologne/DE
Purpose: Spectral computed tomography (CT) allows for Material Decomposition by assessing attenuation caused by photoelectric effect and compton scattering seperately. Followed by dedicated reconstruction algorithms, dedicated spectral results can be generated that highlight different material characteristics. We claim that combining these quantitative characteristics will result in a characteristic CT-signature or fingerprint of tissues and that this clustering can be automated using artificial intelligence. This study served as proof-of-concept of this approach in a phantom setup.
Methods or Background: A 3D-printed phantom filled with different solutions containing iodine (2 different concentrations), gadolinium, iron oxide and water was scanned using a dual layer spectral CT system. Conventional images, iodine maps and low energy virtual monoenergetic images were reconstructed. An unsupervised, high dimensional, k-means clustering algorithm was developed and used to automatically provide clusterings. These clusters were then forward-projected into the image and visually checked for agreement.
Results or Findings: Aforementioned high-dimensional, AI-based clustering allowed for cluster formation (mean RMSE = 0.68, range [0.06-1.63]). Forward projection revealed, that clusters match very closely to actual phantom compartments as indicated by DICE coefficients (all >0.8).
Conclusion: CT Fingerprinting, i.e. the AI-based derivation of a tissue signature ist feasible from a technology standpoint. Future research needs to focus on elevating the dimension of the clustering algorithm and on clinical translation.
Limitations: The study has been carried out in a phantom only, as it should serve as proof-of-concept. Only 3 dimensions have been introduced in the clustering approach to date.
Funding for this study: This study was funded by the Advanced Clinician Scientist Program (AdCCSP), University of Cologne.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This was a phantom study and hence, no ethical approval was sought.
7 min
Pre-operative prediction of axillary lymph node status using ML applied to breast US: a multicentre study
Martina Caruso, Naples / Italy
Author Block: M. Caruso, A. De Giorgio, R. Cuocolo, L. R. La Rocca, M. Ferrante, A. Stanzione, S. Maurea, V. Romeo, A. Brunetti; Naples/IT
Purpose: To assess whether a machine-learning (ML) algorithm could empower the ability of US to preoperatively define axillary lymph node (ALN) status in breast cancer (BC) using a multicentric dataset.
Methods or Background: Patients with at least one histologically proven BC lesion, who underwent preoperative breast US, were retrospectively selected in three different Institutions. BC lesions were segmented on US images by three different operators and radiomics features (first, second higher order) were extracted. Multi-step feature selection was performed using Intraclass Correlation Coefficient (ICC) analysis and principal component analysis (PCA). Thereafter, a Random Forest (RF) ML classifier was applied to the dataset to predict the ALN status (positive/negative for metastasis) and its performance assessed through the Matthews Correlation Coefficient (MCC).
Results or Findings: A total of 293 BCs (ALN negative: 176; ALN positive: 117) were included in the study. Three datasets were identified as follows: 1) training set, composed of 235 BCs (ALN-: 141; ALN+: 94); 2) validation set including 30 BCs (ALN-: 17; ALN+: 13); and 3) test set made of 30 BCs (ALN-: 18; ALN+: 12). 549 radiomics features were extracted from US images; of these, 280 were discarded according to ICC analysis, with a total of 5 features finally selected by PCA. RF classifier showed a MCC of 0.97, 0.11 and 0.08 in the training, validation, and test set, respectively.
Conclusion: ML applied to a multicentric dataset showed promises in the preoperative assessment of ALN status in BC. However, further efforts are necessary to improve the generalisability of the model when applied to external datasets.
Limitations: Limited sample size
Funding for this study: No funding was obtained for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by local ethics committee.
7 min
Prediction of patient survival with residual convolutional neural network (ResNet) in unresectable synchronous liver-only metastatic colorectal cancer treated with bevacizumab-based chemotherapy
Sung-Hua Chiu, Taipei / Taiwan, Chinese Taipei
Author Block: S-H. Chiu, C-C. Wu, W-C. Chang, P-Y. Chang; Taipei/TW
Purpose: To verify the prediction of survival with residual convolutional neural network (ResNet)-determined morphological response (ResNet-MR) in patients with unresectable synchronous liver-only metastatic colorectal cancer (mCRC) treated with bevacizumab-based chemotherapy (BBC).
Methods or Background: A retrospective review of liver-only mCRC patients treated with BBC from December 2011 to April 2021 was performed. Patients with metachronous liver metastases or receiving locoregional treatment before initiation of BBC were excluded. Downstaging to curative treatment and overall survival (OS) were recorded. Two abdominal radiologists evaluated CT images based on the morphological criteria and divided images into group 1, 2, and 3. These divided images established the radiologists-determined morphological response (RD-MR), which classified patients into responders and non-responders based on the morphological change 3 months after initiation of BBC. Then, group 1 and 3 images divided by radiologists were inputted into ResNet as training dataset. The trained ResNet redivided group 2 images into group 1 and 3. ResNet-MR was determined by these redivided images and initial group 1 and 3 images determined by radiologists.
Results or Findings: Eighty-four patients were enrolled (53 male with median age 60.0 years old). The follow-up time ranged from 10-86 months. There were 407 group 1 and 3 images imputed into ResNet as training dataset. Both RD-MR and ResNet-MR correlated with OS (p-value = 0.0167 and 0.0225, respectively). RD-MR classified 28 patients (33.3%) as responders, and ResNet-MR classified additional 16 patients (19.0%) as responders; these 16 patients showed longer OS than the rest of non-responders in RD-MR (27.49 versus 21.20 months, p-value=0.043), and had higher percentage to reach downstaging (37.5% versus 17.5%, p-value=0.1610).
Conclusion: ResNet showed it ability to predict therapeutic effect of BBC in mCRC patients, which will optimise individualised cancer treatment.
Limitations: A single-centre retrospective study .
Funding for this study: Funding was obtained for this study with the funding number: Nsc 109-2314-B-016-012.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved with the approval protocol number: TSGHIRB No. 2-108-05-153, under the protocol title "Development of artificial intelligence approaches for Detection and Characterization of Hepatic Lesion and Evaluation of HCC treatment".
7 min
Predicting histological response in pancreatic adenocarcinoma treated with neoadjuvant chemotherapy: conventional clinicoradiological variables and added value of radiomics
Junaid Mushtaq, Milan / Italy
Author Block: J. Mushtaq, D. Palumbo, F. Prato, M. Mori, S. Crippa, M. Falconi, C. Fiorino, F. De Cobelli; Milan/IT
Purpose: Pancreatic cancer prognosis remains abysmal due to late-stage diagnosis and limited treatment options. Neoadjuvant chemotherapy (NAT) is increasingly used to improve outcomes. However, it complicates radiological assessment, making it difficult to accurately determine tumor status. This study aims to explore radiomics as a non-invasive tool for predicting important prognostic factors and compares its performance with traditional radiological assessments.
Methods or Background: This retrospective single-centre study included patients with pancreatic adenocarcinoma who underwent NAT and pancreaticoduodenectomy between January 2015 and December 2021. Clinical, radiological and pathological data were collected, and endpoints identified: disease recurrence, N2, Tumor Regression Grade (TRG). Radiomic features were extracted from pre-NAT and post-NAT CT images. Machine learning approaches, including bootstrapping, were used to develop predictive models. Two models were built: a purely radiomic model and a combined model including clinical and radiological data. The population was divided in a training and a validation cohorts. Balanced groups ensured robust analysis following established guidelines.
Results or Findings: The study included 156 patients (training n=103, validation n=53). Radiological and clinical variables, except for delta Ca19.9 levels, failed to predict the endpoints. Radiomics, using delta values to capture changes in the tumor microenvironment, showed promise. Delta radiomic models were successfully validated for the N2 and TRG endpoints and performed moderately well with AUCs of 0.749 (p=0.031) and 0.710 (p=0.046) respectively. Excellent negative predictive values, 82% and 79% respectively for N2 and TRG, indicate the model's ability to identify low-risk patients. A combined model with delta CA19.9 did not significantly improve performance.
Conclusion: Radiomics may hold potential for improved patient selection in pancreatic cancer treatment.
Limitations: The study is limited by its retrospective nature and single-center design. External validation is necessary for confirming the results.
Funding for this study: No funding was obtained for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This retrospective study was conducted at the San Raffaele Scientific Institute (Milan, Italy) and was carried out within the framework of an approved Ethics Committee study (28/INT/2015).
7 min
Monitoring over time of pathological complete response to neoadjuvant chemotherapy in breast cancer patients through an ensemble vision transformers-based model
Maria Colomba Comes, Bari / Italy
Author Block: M. C. Comes, S. Bove, A. Fanizzi, R. Massafra; Bari/IT
Purpose: Morphological and vascular peculiarities of breast cancer can change during neoadjuvant chemotherapy (NAC). Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) acquired pre- and mid-treatment quantitatively capture information about tumour heterogeneity as potential earlier indicators of pathological complete response (pCR) to NAC in breast cancer. This study aimed to develop an ensemble deep learning-based model, exploiting a Vision Transformer (ViT) architecture, which merges features automatically extracted from five segmented slices of both pre- and mid-treatment exams containing the maximum tumour area, to predict and monitor pCR to NAC.
Methods or Background: Imaging data analysed in this study referred to a cohort of 86 breast cancer patients, randomly split into training and test cohorts at a ratio of 8:2, who underwent NAC and for which information regarding the pCR achievement was available (37.2% of patients achieved pCR). As far as we know, our research is the first proposal using ViTs on DCE-MRI exams to monitor pCR over time during NAC.
Results or Findings: The performances of the proposed model were assessed using standard evaluation metrics and promising results were achieved: AUC value of 91.4%, accuracy value of 82.4%, a specificity value of 80.0%, a sensitivity value of 85.7%, precision value of 75.0%, F-score value of 80.0%, G-mean value of 82.8%.
Conclusion: Finally, the heterogeneity changes in DCE-MRI at pre- and mid-treatment could affect the accuracy of pCR prediction to NAC.
Limitations: The study needs to be validated in a larger cohort.
Funding for this study: The funding for this study was obtained by the Ministery of Health, Italy.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Ethic Commitee of Istituto Tumori ‘Giovanni Paolo II’, Bari, Italy.
7 min
Retrospective evaluation of interval breast cancer: can the number of interval carcinomas be reduced utilising AI diagnostic software?
Jonas Subelack, St. Gallen / Switzerland
Author Block: J. Subelack1, A. Geissler1, J. Vogel1, M. Blum1, A. Eichenberger1, R. Morant1, A. Gräwingholt2, D. Kuklinski1; 1St. Gallen/CH, 2Paderborn/DE
Purpose: We investigate whether an artificial intelligence (AI) powered mammography screening software can support radiologists and screening programs to reduce interval carcinomas (ICs).
Methods or Background: Combining data from the cancer registry of eastern Switzerland and the “donna” screening program, we include data from 151,245 screening mammograms between 2010 and 2019. Hereby, 264 ICs were identified when a carcinoma was detected opportunistically within 24 months after screening. Mammograms of the 264 ICs which in the screening round prior to detection were considered as normal and 90 randomly selected true normals were reviewed retrospectively by three independent radiologists and an AI-diagnostic software (ProFound-AI, iCAD). The software calculates three measures: (1) lesion-score: probability of a marked lesion in a mammogram to be a cancer; (2) case-score: probability whether the set of mammograms of one woman contains a cancer; (3) risk-category: woman’s risk to be detected with breast cancer within two years. We evaluate which measure (combination) finds best signs of later detected ICs in the prior mammograms that were read as normal and also compare it to the review of the three radiologists. We further estimate the accuracy of the software depending on thresholds.
Results or Findings: We expect that both retrospective assessments have an improved detection rate of ICs compared to the initial screening. Detection probability of ICs and the number of false-positives is highly dependent on the thresholds set for the measure(s). Using predefined thresholds, we expect that the software is superior to the retrospective assessments of radiologists in detecting ICs.
Conclusion: If the software detects significantly more cancer signs than the radiologists, it needs to be discussed how best to integrate the software into the mammography screening process to reduce ICs.
Limitations: Only ICs are analysed in depth, limiting the scope of this study.
Funding for this study: The cancer league of eastern Switzerland funds this cooperative research project.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: A declaration of non-responsibility is at hand.
7 min
Exploring the potential of ChatGPT and a context-aware ChatGPT in gastrointestinal radiology: a differential diagnostic accuracy assessment
Stephan Rau, Freiburg / Germany
Author Block: S. Rau, A. Rau, J. Nattenmüller, A. Fink, F. Bamberg, M. Russe; Freiburg im Breisgau/DE
Purpose: The growing volume of imaging studies necessitates tools to assist radiologists in delivering accurate and effective diagnoses. We aimed to investigate the potential of ChatGPT by OpenAI and a context-aware chatbot based on ChatGPT in providing differential diagnoses.
Methods or Background: Utilizing the LlamaIndex framework, which integrates datasets into large language models, ChatGPT 4 was enhanced using the 96 documents from the Radiographics Top 10 Reading list on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (giaGPT). To assess the differential diagnostic capability, a set of fifty case-files on abdominal pathologies was created, comprising of radiological findings in fluoroscopy, MRI and CT. We compared the giaGPT to the generic ChatGPT 4 (gGPT) in terms of offering the top three differential diagnoses using interpretations from senior level radiologists as ground truth. Additionally, the trustworthiness of the giaGPT was evaluated by investigating the utilized source document as provided by the embedded context-retrieval mechanism.
Results or Findings: Within the evaluated dataset, the giaGPT demonstrated a high capability in identifying the most appropriate differential diagnosis in a vast majority of cases with a sensitivity of 78%, significantly surpassing the gGPT with 54%. Notably, giaGPT offered the primary differential in the top 3 differential diagnoses in 90% of the cases, gGPT in only 74%.
Conclusion: Context-aware ChatGPT-based algorithms may provide a promising tool for supporting radiologists in the task of differential diagnostics. It offers accurate differentials and direct access to the employed source documents, providing insight into the decision-making process, providing trustworthy and evidence-based clinical decision-support.
Limitations: Although the findings are promising, they call for a more extensive evaluation using datasets from clinical routine covering a more diverse spectrum of pathologies. Additionally, future research should focus on the impacts of diagnostic confidence and time efficiencies.
Funding for this study: This study was financially supported by internal funds.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No ethics committee decision was needed in the study because no patient data were used.
7 min
CT effective dose in intensive care unit patients: comparison between deep learning image reconstruction, filtered back projection and iterative algorithms
Elena Agostini, Padua / Italy
Author Block: E. Agostini, E. K. Lanza De Cristoforis, C. Zanon, E. Quaia; Padua/IT
Purpose: The aim of this study was to assess the effective dose reduction and image quality improvement provided by deep learning image reconstruction (DLIR) in comparison to filtered back projection (FBP) and iterative reconstruction (IR) algorithms.
Methods or Background: Eighty-three critical care patients who underwent CT imaging of the chest, abdomen and trunk (chest + abdomen) within a period of 30 days using both DLIR (TrueFidelity) and FBP or IR hybrid (AIDR3D) and model-based IR algorithm (ADMIRE) were included. All examinations were performed using automatic exposure control (AEC) which modulates the tube current and hence radiation exposure according to the algorithm applied. Radiation dose was assessed using CT dose index volume (CTDI volume), dose-length product (DLP) and Effective Dose. For the quantification of image quality, noise and signal to noise ratio (SNR) were used. All parameters were compared across the different reconstruction methods for each patient using both parametric (t test) and non-parametric (Wilcoxon) tests. In cases of contrast-enhanced CT (CECT), all parameters were retrieved for every acquisition phase (direct, arterial, venous or delayed) as well as for their total value as stated in the patient protocol.
Results or Findings: DLIR vs FBP improved both CTDIvol (9.56 ± 5.86 vs 24.67 ± 61.01), DLP (1085.33 ± 626.30 vs 1350.62 ± 1191.68), and effective dose (16.13 ± 9.55 vs 20.19 ± 17.9), while DLIR vs FBP and vs IR improved both image noise (8.45 ± 3.24 vs 28.85 ± 32.77 vs 14.85 ± 2.73 HU) and SNR (3.99 ± 1.23 HU vs 11.53 ± 9.28 vs 4.84 ± 2.74 HU).
Conclusion: DLIR provides benefits in terms of effective dose and image quality over the traditional FBP. It also outperforms IR methods for image quality, but not for effective dose.
Limitations: No limitations were identified.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the institutional review board.
7 min
Insights from a commercial AI tool’s performance in detecting common acute and chronic chest radiograph abnormalities in a multi-ethnic Singapore dataset using AimSG’s automated auditing tools
Timothy Shao Ern Tan , Singapore / Singapore
Author Block: T. S. E. Tan , G. Han Leong, J. Zou, J. Y. C. J. Liew; Singapore/SG
Purpose: To clinically validate the performance of a machine learning-based algorithm tool in identifying abnormal findings on chest radiographs (CXRs).
Methods or Background: Two consultant radiologists with between 5 and 15 years of experience read 300 adult CXRs performed in 2021 at a high-volume tertiary Singapore acute hospital to establish ground truth. The Lunit INSIGHT CXR (South Korea) artificial intelligence (AI) algorithm was integrated into the hospital RISPACS on a national vendor-neutral AI imaging platform (AimSG, Singapore), which contains an integrated data stack and performance monitoring data tools (Carpl.AI, New Delhi). Audit of the AI algorithm's performance on the CXRs was compared with the radiologists' analyses on AimSG. The area under the receiver operating characteristic curve (AUROC) was calculated for ten specific abnormalities.
Results or Findings: Three hundred adults (median age 75.7, range +/- 15.6 ; 182 males, 61%) comprising 183 (61% Chinese), 96 (32% Malays), 16 (5% Indians), and 5 (2%) other ethnicities had CXRs performed for acute respiratory and thoracic symptoms. A total of 281 (93.7%) of 300 CXRs demonstrated at least one acute or chronic abnormality. The algorithm achieved an AUROC of between 0.95 and 0.99 on this dataset at a threshold of 50%. The F1 score ranged from 0.3-0.76 for chronic findings such as lung nodules, calcifications, and fibrosis to between 0.86-0.93 for acute/life-threatening findings such as pleural effusion, pneumothorax, and consolidation, with the highest positive predictive value of 0.97 for consolidation.
Conclusion: The AI algorithm performed better in identifying acute or life-threatening findings compared to chronic findings on CXRs and can be a useful triage tool in acute clinical settings in multi-ethnic South-East Asian cohorts. This ongoing study also demonstrates the utility of algorithmic monitoring of AI model performance through AimSG.
Limitations: Small sample size.
Funding for this study: No funding was obtained for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was ethically approved by the Singhealth CIRB Ref: 2021/2337.

This session will not be streamed, nor will it be available on-demand!