Research Presentation Session: Imaging Informatics and Artificial Intelligence

RPS 1405 - Artificial intelligence in musculoskeletal imaging

February 28, 12:30 - 13:30 CET

7 min
Development and validation of deep learning model for screening low bone mineral density using chest radiographs: A multicentre, multinational study
Jeongmin Song, Seoul / Korea, Republic of
Author Block: J. Song, M. Kim, G. Lee, J. Jeong, S. J. Bae, J-M. Koh, N. Kim; Seoul/KR
Purpose: This study aimed to develop and validate a deep learning model for screening of patients with low bone mineral density (BMD) using chest radiographs (CXRs).
Methods or Background: We retrospectively collected CXR data paired with DXA results from patients aged 50 and above from five different resources. Each patient's BMD was classified using a T-score threshold of -1.0, with scores of -1.0 or above defined as ‘normal’ and those below as ‘low BMD’. Of the 57,589 CXRs from Hospital A, 55,600 were utilized for training, and 1,989 were used for internal validation. For external validation, 3338, 938, and 295 CXRs were collected from B, C hospitals and D platform, respectively, representing diverse patient demographics and clinical backgrounds. A deep learning model was developed to perform binary classification of patients' BMD as either normal or low, based on their CXRs.
Results or Findings: In the A dataset, the model yielded an AUC of 0.95 and demonstrated sensitivity of 0.97, specificity of 0.65, and F1 score of 0.90. In the datasets B, C, and D, the model achieved AUCs of 0.91, 0.89, and 0.82. The model’s sensitivity was 0.87, 0.88, and 0.64; specificity was 0.77, 0.71, and 0.85; and F1 score was 0.80, 0.88, and 0.76, respectively.
Conclusion: The proposed low BMD screening system demonstrated performance exceeding an AUC of 0.8 in all external datasets, highlighting the robustness of the system. Notably, the system showed promising performance even on the D dataset, which comprised individuals of completely different racial backgrounds. This suggests the potential to promptly identify patients with low BMD from CXRs, the most widely used imaging modality globally.
Limitations: First, this is a retrospective study. Second, a substantial proportion of the dataset comprises a single national population.
Funding for this study: No funding was provided for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethics approval was obtained from the Ethics Committee of Asan Medical Center (No. 2019-1226), the Public institutional Bioethics Committee (No.2024-0256-001) and the Ethics Committee of Korea VHS Medical Center (No.2022-10-003-001).
7 min
Evaluating the Impact of Artificial Intelligence on Fracture Detection: A Multinational Randomized Crossover Study on diagnostic thinking efficacy
Bastiaan Johannes Van Der Zwart, Den Haag / Netherlands
Author Block: B. Van Der Zwart1, H. C. Ruitenbeek1, M. Boesen2, M. W. Brejnebol2, G. Gunes3, K-G. A. Hermann4, K. Ziegeler4, E. Oei1, J. J. Visser1; 1Rotterdam/NL, 2Copenhagen/DK, 3Ortaca/TR, 4Berlin/DE
Purpose: To assess the impact of AI assistance on fracture detection on conventional radiographs by conducting a multi-country, multicenter randomised crossover study.
Methods or Background: Radiography data from 1,500 consecutive adult cases with suspected posttraumatic fractures were gathered along with relevant clinical information, with 500 cases from each of three European sites. All cases were read by senior and junior radiologists and orthopedic surgeons both without and with AI assistance in two sessions separated by at least four weeks. A reference standard was established by expert radiologists with the help of clinical data and follow-up imaging. The mean change in diagnostic accuracy was measured both per case (sensitivity and specificity) and per fracture (sensitivity).
Results or Findings: Sensitivity at the case-level increased with the AI assistance for all reader groups with +0.074 for senior radiologists, +0.181 for junior radiologists, +0.095 for both senior and junior orthopedic surgeons. The specificity was negatively impacted with the AI assistance for the radiologists with changes of -0.010 for senior and -0.027 for junior radiologists, respectively. The specificity increased for orthopaedic surgeons with +0.016 and +0.024 for senior and junior surgeons, respectively. The changes in sensitivity per fracture with the AI assistance were +0.089 and +0.168 for senior and junior radiologists and +0.082 and 0.111 for senior and junior surgeons, respectively.
Conclusion: Our study demonstrates that AI assistance enhances fracture detection on conventional radiographs, yielding improved patient-wise sensitivity across participating centers. The changes in specificity was positive for orthopedic surgeons, but negative for radiologists.
Limitations: The study was conducted in a simulated setting, potentially impacting reader performance. Additionally, the Hawthorne effect may have influenced reader behaviour, as participants were aware they were being observed.
Funding for this study: This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement no. 954221. The results presented in this work reflect only the views of the authors. The Commission is not responsible for any use that may be made of the information it contains.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the institutional review board of the Erasmus MC, Rotterdam (Study ID: MEC-2021-0430) and the need to obtain informed consent was waived by the institutional review boards of Charité Universitätsmedizin–Berlin (no. EA4/079/22) and the Danish Patient Safety Authority.
7 min
Post implementation validation - False Positives and Negatives in AI Fracture Detection in Clinical Workflow: A Deep Dive
Ramprabananth Sivanandan, Asker / Norway
Author Block: R. Sivanandan1, J. Vardal2; 1Sandvika/NO, 2Drammen/NO
Purpose: This study evaluates post-implementation monitoring of an AI fracture detection algorithm, focusing on false positives and negatives, their clinical implications, and mitigation strategies. It also assesses the algorithm's impact on workflow and patient outcomes and further possibility of research studies.
Methods or Background: The algorithm was implemented sequentially across 5 hospitals in Vestre Viken Health Trust, Norway, with a follow-up validation conducted at the primary hospital using 1284 cases. AI results were negative (60.8%), positive (37.9%) and doubtful (1.3%). Our new workflow after AI implementation allowed AI-negative patients to be discharged, while AI-positive cases were referred to clinicians. Radiographers initially reviewed AI results and guided patients flow, while radiologists reporting all cases. Orthopedic residents and consultants subsequently examined these images with AI results.
Results or Findings: Validation revealed 86% true negatives, 2% false negatives, 7% true positives, 4% false positives, and 4% doubtful cases. These results were consistent with preliminary external validation. Patients with AI-positive had false positives (5.6%) were primarily attributed to old fractures, skin folds and heterotopic calcifications, with minimal clinical significance. Among the patient sent home with AI-negative result, false negatives (2.3%) mainly included minimal knee effusion in adults, benign bone lesions, and small avulsion fractures requiring conservative treatment. Only one patient with an avulsion fracture and drop finger was erroneously discharged and recalled for clinical re-evaluation.
Conclusion: The validation results aligned with pre-implementation findings, with false positives and negatives having minimal clinical impact. This study highlights the potential for further research that is planned to evaluate the necessity of radiologist reporting for AI-positive cases, given that clinicians already review these images.
Limitations: Few patients’ clinical results were not analyzed due to restricted access to patient journal. Few patients status after imaging were missing.
Funding for this study: No funding
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Information was collected as per the Ethics
7 min
Assessing the Generalisability of a Paediatric Wrist Fracture Detection AI Model Using a Novel Dataset
Cato Pauling, London / United Kingdom
Author Block: C. Pauling, O. Arthurs, B. Kanber, S. C. Shelmerdine; London/UK
Purpose: The purpose of this study was to assess the generalisability of an artificial intelligence (AI) model, trained on open-source data, for the detection of fractures and other abnormalities in paediatric wrist radiographs using a novel, external, and multi-centric dataset.
Methods or Background: A novel retrospective case dataset was curated from two paediatric trauma centres in London, England. The dataset comprises 865 images with a mean patient age of 10.4 ± 3.5 [standard deviation] years. Ground truth annotations for the external test dataset were established by consensus opinion of at least two paediatric radiologists. To imitate real-world prospective data, no pre-processing was applied to the external data and only invalid scans were excluded.

A YOLOv7-X model was trained on GRAZPEDWRI-DX, an open-source paediatric wrist trauma radiograph dataset. After achieving an optimal performance on the test split of data, the model was used to perform inference on the novel external data and the performance metrics were compared.
Results or Findings: The sensitivity of the model for the detection of fractures was 89.0% on the test split of the open-source data. When evaluating on the novel external data, the sensitivity decreased by 32.6%. The reduction in the performance of the model across all detection classes was less severe, with a change to mean Average Position (mAP) of [email protected] (-0.067 mAP@[0.5:0.95]).
Conclusion: The model failed to adequately generalise to an external dataset evidenced by a notable decline in fracture detection sensitivity. It is of critical importance to ensure that AI models intended for use in a prospective clinical setting are externally validated. Additionally, data quality and pre-processing procedures can significantly impact model performance.
Limitations: The open-source training dataset contains annotations for additional pathologies which are not included in the external test dataset.
Funding for this study: CP is funded by the Great Ormond Street Hospital Children’s Charity (GOSHCC) (Award Number: VS0618).
OJA is funded by an NIHR Career Development Fellowship (NIHR-CDF-2017-10-037).
SCS is funded by an NIHR Advanced Fellowship Award (NIHR-301322).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethical approval was provided by the National Health Service (NHS) Health Research Authority (HRA) (IRAS ID: 274278, REC reference 22/PR/0334)
7 min
AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines
Stefan Klein, Rotterdam / Netherlands
Author Block: D. J. Spaanderman1, M. Marzetti2, X. Wan1, A. Scarsbrook2, E. Oei1, D. Grünhagen1, S. Klein1, M. P. A. Starmans1; 1Rotterdam/NL, 2Leeds/UK
Purpose: Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review aims to provide an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of STBT, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote clinical translation of AI methods.
Methods or Background: The systematic review identified literature from several bibliographic databases, covering papers published before 17/07/2024. Original research published in peer-reviewed journals, focused on radiology-based AI for diagnosis or prognosis of primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers to determine eligibility. Included papers were assessed against the two guidelines by one of three independent reviewers. (PROSPERO Registration: CRD42023467970)
Results or Findings: The search identified 15,015 abstracts, and 325 articles were included for evaluation. Studies performed moderately on CLAIM, averaging a score of 28∙9±7∙5 out of 53, but poorly on FUTURE-AI, averaging 5∙1±2∙1 out of 30.
Conclusion: Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (define unmet clinical need, intended clinical setting and integration), development (build on previous work, training with data reflecting real-world usage, explainability), evaluation (addressing biases, evaluating using best practices), and data reproducibility and availability. Following these recommendations could improve clinical translation of AI methods.
Limitations: Limitations include single-reviewer scoring due to the high volumn of literature included and assessment against as-of-yet unpublished, FUTURE-AI guidelines. However, FUTURE-AI were developed by a large group of international medical AI experts.
Funding for this study: Hanarth Fonds, ICAI Lab, NIHR, EuCanImage
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: Not applicable to the systematic review.
7 min
Multi-Center External Validation of an Automated Method Segmenting and Differentiating Atypical Lipomatous Tumors from Lipomas Using Radiomics and Deep-Learning on MRI
Stefan Klein, Rotterdam / Netherlands
Author Block: D. J. Spaanderman1, S. Hakkesteegt1, D. Hanff1, C. Messiou2, L. Nardo3, D. Grünhagen1, C. Verhoef1, M. P. A. Starmans1, S. Klein1; 1Rotterdam/NL, 2London/UK, 3Sacramento, CA/US
Purpose: Differentiating between lipomas and atypical lipomatous tumors (ALTs) on imaging is challenging, often requiring biopsies. This study aimed to externally and prospectively validate a radiomics model to distinguish between lipomas and ALTs using MRI across three large, multi-center cohorts. Additionally, the model was extended with automatic and minimally interactive segmentation methods to improve clinical applicability.
Methods or Background: Three cohorts were analyzed: two for external validation (US data from 2008–2018 and UK data from 2011–2017), and one for prospective validation (Netherlands, 2020–2021). Patient data, including MDM2 amplification status and MRI scans, were collected. An automatic segmentation method was developed for T1-weighted MRI scans, with interactive segmentation applied in case of poor quality. Radiomics model performance was compared with that of two radiologists.
Results or Findings: The cohorts included 150 (54% ALT), 208 (37% ALT), and 86 patients (28% ALT) from the US, UK, and Netherlands, respectively. Automatic segmentation succeeded in 78% of cases, while 22% required interactive segmentation, with only 3% needing manual adjustments. External validation yielded AUCs of 0.74 (95% CI: 0.66, 0.82) (US) and 0.86 (0.80, 0.92) (UK), and prospective validation achieved an AUC of 0.89 (0.83, 0.96) (Netherlands). The radiomics model performed similarly to radiologists in all cohorts.
Conclusion: The radiomics model, combined with automated and minimally interactive segmentation methods, effectively differentiated between lipomas and ALTs, matching the performance of expert radiologists and potentially reducing the need for biopsies.
Limitations: First, the segmentation workflow was performed by a single clinician, hence the potential impact of different users on radiomics performance was not assessed. Second, MDM2 amplification status, determined by core needle biopsy or resected specimens, may include false negatives, affecting the accuracy of the ground truth.
Funding for this study: This research was supported by an unrestricted grant of Stichting Hanarth Fonds, The Netherlands. MPAS and SK acknowledge funding from the research project EuCanImage (European Union's Horizon 2020 research and innovation programme under grant agreement Nr. 95210). This study was supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at The Royal Marsden NHS Foundation Trust and The Institute of Cancer Research, London, and by The Royal Marsden Cancer Charity. The work was also supported by the In Vivo Translational Imaging Shared Resources with funds from NCI P30CA093373. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study protocol was approved by the local medical ethics review committee (MEC-2020-0175), and performed in accordance with national and international legislation. Informed consent was required and obtained exclusively from participants in the prospective study cohort. For the training and external validation cohorts, approval by the local medical ethics review committee and the waiver of informed consent were previously reported.
7 min
Radiopsy, quantitative wb-mri adc and fat fraction sequences for discrimination of smoldering multiple myeloma and multiple myeloma: a prospective observational study
Giacomo Feliciani, Bologna / Italy
Author Block: G. Feliciani1, A. Rossi1, C. Cerchione1, E. Loi1, E. Antognoni1, A. Cattabriga2, M. Marchesini1, D. Barone1, A. Sarnelli1; 1Meldola/IT, 2Bologna/IT
Purpose: To distinguish between Multiple Myeloma and High-Risk Smouldering Myeloma at staging using image-based biomarkers obtained from Whole Body-MRI (WB-MRI) Apparent Diffusion Coefficient (ADC) and Fat Fraction (FF) sequences.
Methods or Background: From January 2021 to March 2024, we enrolled consecutive myeloma patients at staging into an prospective trial and divided them into Smouldering Multiple Myeloma (SMM) and Multiple Myeloma (MM). All patients underwent WB-MRI. We use the term "Radiopsy" to indicate the quantification and modelling of image characteristics nearby the biopsy site to predict patient status. A radiologist placed a cylindrical VOI nearby the biopsy site and 5 more identical VOIs on distant sites such the pelvis bone and on D11 and L5 vertebrae. LASSO was used to select the most predictive features and build logistic regression models, which were then validated using the test set. ROC curves were used as metrics for models’ performance assessment.
Results or Findings: The study included 102 patients (46 males, mean age 63 ± 12 [SD]) with 60 diagnosed with MM and 42 with SMM. 144 quantitative features were extracted from the VOI at the biopsy site WB-MRI ADC and FF sequences for each patient. Radiopsy model showed a median AUC of 0.80 (0.75-0.90) in the training phase and a median AUC of 0.70 (0.55-0.80) in the test phase. The best predictive model had an AUC of 0.95 and 0.75 in the training and test phase, respectively. The models used to predict patient status at biopsy site were also predictive in distant VOIs.
Conclusion: Conclusions: Radiopsy models can distinguish between MM and SMM with good performance nearby the biopsy site. Radiopsy can be used to predict disease invasion on distant sites where biopsy is not possible or not feasable
Limitations: Single center
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: protocol name: AccuMRI code: IRST 100.15
7 min
Feasibility of generating sagittal radiographs from coronal images using deep learning in adolescent idiopathic scoliosis
Maria Elena Pellegrino, Milan / Italy
Author Block: M. E. Pellegrino1, T. Bassani1, A. Cina2, F. Galbusera2, A. Cazzato1, D. Albano1, L. M. Sconfienza1; 1Milan/IT, 2Zurich/CH
Purpose: Minimizing radiation exposure is crucial in clinical monitoring of adolescent idiopathic scoliosis (AIS). Generative adversarial networks (GANs) have gained prominence in medical imaging due to their ability to learn complex patterns and generate high-quality synthetic images by transforming one type of image into another. This study explores GANs to generate synthetic sagittal radiographs from coronal views in AIS patients.
Methods or Background: A retrospective dataset of 3,935 AIS patients with mild-to-moderate scoliosis (Cobb angle <45°) was analyzed. The subjects underwent radiographic spine and pelvis examination using the EOS system, which acquires coronal and sagittal images simultaneously. The dataset was split into training (85%, n=3,356) and validation (15%, n=579). A pix2pix-based GAN model was trained to generate sagittal images from coronal views, targeting real sagittal views. To evaluate accuracy, 100 subjects from the validation set were randomly selected for manual measurement of lumbar lordosis (LL), sacral slope (SS), pelvic incidence (PI), and sagittal vertical axis (SVA) by two radiologists in both synthetic and real images.
Results or Findings: Of the 100 synthetic images, 69 were deemed assessable. Intraclass correlation coefficient ranged from 0.93 to 0.99 for measurements in real images and from 0.83 to 0.88 for synthetic images. Correlations between parameters in real and synthetic images (mean values between raters) were 0.52 (LL), 0.17 (SS), 0.18 (PI), 0.74 (SVA). Errors in parameters showed minimal correlation with Cobb angle. The mean±SD absolute errors were 7±7° (LL), 9±7° (SS), 9±8° (PI), 1.1±0.8 cm (SVA).
Conclusion: While the model generates sagittal images consistent with reference images, their quality is not sufficient for clinical parameter assessment, except for promising results in SVA, which describes lateral plumb line alignment.
Limitations: The quality of sagittal images is insufficient for assessing clinical parameters, except for SVA.
Funding for this study: Italian Ministry of Health.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Not applicable.