Artificial intelligence and machine learning for x-ray imaging - ESR Connect

Research Presentation Session

RPS 1205 - Artificial intelligence and machine learning for x-ray imaging

  • 9 Lectures
  • 49 Minutes
  • 9 Speakers

No access granted. Register to watch.

Lectures

1
RPS 1205 - Deep learning-based architecture for detection of tuberculosis in digital chest radiography: our experience in the Indian scenario

RPS 1205 - Deep learning-based architecture for detection of tuberculosis in digital chest radiography: our experience in the Indian scenario

05:58A. Kharat, Pune / IN

Purpose:

Tuberculosis (TB) has infected 1% of the global population and is a major cause of death in developing countries.

Methods and materials:

Deep learning (DL) was used to classify every digital chest x-ray (DX) into 3 types: abnormal (TB likely), abnormal (TB unlikely), and normal. A model trained on a combination of publicly available ‘NIH DX dataset’ and private anonymised hospital data was used. Dataset for modelling: 2,050 images with train-test ratio=78:22. Training set: 1,600 images with 530 abnormal TB, 530 abnormal non-TB, and 540 normal DX and test set consisting of 450 images equally balanced in the 3 classes with 150 images each. Basis: convolutional neural networks (CNN). DX divided into 3 predefined classes. Steps: 1] conversion of DX DICOM to PNG format, image resized to 224 x 224 and fed to the DenseNet-121 model for training, and 2] learning rate scheduler to reduce the learning rate to 1/10th of its initial value after every 10 epochs.

Results:

Test set: 450 images.

Conclusion:

CNN scores can potentially be used for TB screening in DX with a radiologist in the loop approach. TB screening programs to assist government hospitals using DL can be a game-changer in curbing TB. Instant alert triage can enable urgent sputum checks and initiation of antitubercular treatment before the patient leaves the hospital premise.

Limitations:

/a

Ethics committee approval

/a

Funding:

No funding was received for this work.

2
RPS 1205 - On the robustness of a deep learning-based algorithm for detecting abnormalities in chest radiographs across different devices and view positions: a retrospective case-control study

RPS 1205 - On the robustness of a deep learning-based algorithm for detecting abnormalities in chest radiographs across different devices and view positions: a retrospective case-control study

05:17S. Park, Seoul / KR

Purpose:

To validate the robustness of a deep learning-based algorithm that detects abnormal lesions in chest radiographs for a screening environment.

Methods and materials:

To verify the robustness of a deep learning-based algorithm, we collected datasets from Seoul National University Hospital. 2,673 chest radiographs taken from 27 x-ray devices were collected to verify the robustness across different x-ray devices. These radiographs were then divided into two groups of which 661 cases (479 abnormal and 182 normal) were computed radiography (CR) and 2,012 cases (1,449 abnormal and 563 normal) were digital radiography (DR). 2,625 (2,028 abnormal and 597 normal) posterior-anterior (PA) images and 1,729 (1,580 abnormal and 149 normal) anterior-posterior (AP) images were collected to verify the robustness across different view positions. These two datasets were reviewed by board-certified radiologists. For the deep learning-based algorithm, we used LUNIT INSIGHT CXR 3 to predict 10 major radiologic findings in chest radiographs. This algorithm was trained with 147,823 images using various augmentation schemes such as photometric and geometric jittering for making itself more robust to radiographs taken with various settings.

Results:

The algorithm achieved AUC of 96.54 and 97.09 for CR and DR, respectively. The p-value of DeLong’s test for the difference between two AUCs was 0.4802. Moreover, for PA and AP images, it achieved AUC of 97.29 and 96.67 for PA and AP, respectively. The p-value was calculated to be 0.3182.

Conclusion:

Chest radiographs have tremendous heterogeneity due to various devices and methods in taking chest radiographs. Our experimental results showed that a well-generalised algorithm can be robust in detecting abnormalities on chest radiographs in a screening environment.

Limitations:

More validation of the robustness from various sites could be useful.

Ethics committee approval

/a

Funding:

No funding was received for this work.

3
RPS 1205 - Artificial intelligence in standard radiology: automatic x-ray diagnostic algorithm

RPS 1205 - Artificial intelligence in standard radiology: automatic x-ray diagnostic algorithm

04:31M. Benta, Timisoara / RO

Purpose:

To develop a performant chest x-ray pathology classification machine learning model for the "Pius Brînzeu" Emergency County Hospital, using publicly available datasets (CheXNet and CheXPert) as its backbone.

Methods and materials:

The convolutional neural network algorithm trained on CheXNet and CheXPert was underperforming when tested on cases from the "Pius Brînzeu" Hospital. We extracted 48,000 anonymised radiographs of consenting patients from the hospital's PACS system. We labelled them by transforming their associated radiological reports into labels and retrain the algorithm with the initial model used as pretraining. Three radiologists hand labelled 2,000 x-rays in the newly obtained dataset and drew class bounding boxes that indicated afflicted areas on the x-rays. We trained a third model with the adjusted dataset.

Results:

AUC was used as the metric to compare model performance. We evaluated 3 models on a gold standard dataset containing 500 radiologist annotated images. The publicly trained model scored 0.809, the automatically labelled model scored 0.813, and the one that uses hand label radiographs scored 0.825.

Conclusion:

Our results indicate that there is a clear benefit to adding hospital-specific images to the dataset. Hand labelling also increases performance over NLP labelling, which shows that NLP has limitations especially in languages that don’t have a word stemmer, like Romanian.

We conclude that retraining a model that uses public data as its backbone with data specific to a hospital strongly increases the performance of the model in the hospital. The most important further improvements come from the quality of the new dataset.

Limitations:

The of hand labelled dataset is small compared to the hospital dataset.

Ethics committee approval

The study was aproved by the local Ethical Commitee.

Funding:

No funding was received for this work.

4
RPS 1205 - Is a deep learning algorithm equivalent to the radiologist in fracture detection on conventional x-rays?

RPS 1205 - Is a deep learning algorithm equivalent to the radiologist in fracture detection on conventional x-rays?

06:02G. Reichert, Paris / FR

Purpose:

The increasing need for emergency imaging has led to a multiplication of conventional x-rays, especially in traumatic injury. At the same time, artificial intelligence (AI) programs are in development and deep learning algorithms could help the radiologist and the emergency room (ER) to screen for patients with fractures.

To determine the accuracy of a deep learning algorithm for the detection of fractures on conventional x-rays in ER patients.

Methods and materials:

We use an algorithm (Rayvolve®) developed by Azmed® for fracture detection in the appendicular skeleton. The study was divided into 2 steps.

In step 1, 2,000 X-rays were selected from the ER of our hospital as the training population and were annotated for the status of fracture or not fractured, to train the algorithm.

In step 2, 126 patients were randomly selected by an emergency doctor as a test set. These x-rays were extracted and annotated by both a radiologist and the algorithm. The results of the algorithm were compared to those of the radiologist.

Results:

In step 1, about 15% of the patients had a fracture.

In step 2, 26 x-rays with fractures were identified. Only 21/26 fractures were detected by the algorithm, which makes a sensitivity rate of 80.7%. Among the 100 patients with no fracture, 35/100 patients were annotated by the algorithm as fractured, which makes a sensibility rate of 35%. The positive predictive value was 37.5% and the negative predictive value was 92.9%.

Conclusion:

This study shows that an algorithm like Rayvolve® could be a valuable computer-aided diagnostic tool for detecting fractures in ER. However, more fractures have to be annotated to improve the accuracy of the algorithm.

Limitations:

/a

Ethics committee approval

/a

Funding:

No funding was received for this work.

5
RPS 1205 - Clinical validation of a deep learning-based bone age software in healthy Korean children

RPS 1205 - Clinical validation of a deep learning-based bone age software in healthy Korean children

04:50W. Lea, Seoul / KR

Purpose:

To evaluate the clinical performance of a deep learning (DL)-based bone age software in healthy Korean children.

Methods and materials:

This retrospective study included 371 healthy children (217 boys, 154 girls) aged between 4 and 17 years who visited the department of paediatrics for growth check-ups between January 2017 to December 2018. A total of 553 left-hand radiographs of 371 healthy Korean children were assessed using a commercial DL-based bone age software (BoneAge, Vuno, Seoul, Korea). Two sample t-test, Fisher’s exact test (two side), Pearson’s correlation coefficient, root mean squared error (RMSE), concordance rate, and Bland-Altman analysis were used to evaluate the clinical performance of DL software by comparing with their chronologic age.

Results:

Two sample t-test (p<0.001) and Fisher’s exact test (p=0.011) showed there is a significant difference between the normal chronological age and the bone age estimated by DL software. Between the two variables, there was a good correlation (r=0.96, p<0.001), however, the RMSE value was 15.2 months. With a 12-months cut-off, the concordance rate was 58.8%. The Bland-Altman plot showed that the DL software had a tendency to estimate the bone age younger than the chronologic age, especially in children under the age of 10 years.

Conclusion:

DL-based bone age software showed a low concordance rate and a tendency to estimate the bone age younger than the chronologic age in healthy Korean children.

Limitations:

A retrospective study, single-centre trial, with a small number of patients.

Ethics committee approval

/a

Funding:

No funding was received for this work.

6
RPS 1205 - Sensitivity to user input in deep learning-based vertebral segmentation from lateral cervical spine x-rays

RPS 1205 - Sensitivity to user input in deep learning-based vertebral segmentation from lateral cervical spine x-rays

06:09K. Knapp, Exeter / UK

Purpose:

Cervical spine injuries (CSIs) occur in approximately 4.3% of trauma patients in the UK. We investigate the sensitivity to user input on computer-aided detection software (CAD) designed to aid clinicians with CSI diagnosis.

Methods and materials:

We collected 137 lateral cervical spine radiographs. Injury diagnosis is dependent on the results of a deep learning segmentation algorithm, which requires vertebrae centres as input. The data was expertly-annotated to define ground truth outlines of the C3–C7 vertebrae.

We used the Unet architecture with a modified loss function to account for vertebrae shape, training the network using 124 images. We defined a ‘true centre’ for each of the 13 test images and randomly varied the input centres from this point, measuring the resulting effect on segmentation performance.

Results:

Segmentation accuracy relates to the ground truth outline. Consistency relates to the result using the ‘true centre’ as input (both using Jaccard index). With 100 pseudo-random variations of the input centres for 13 test images, with a mean accuracy of 0.85 and consistency of 0.92. With ‘radius around true centre: percentage of tests, mean consistency, mean accuracy’ we have ‘<=1 mm: 34%, 0.97, 0.88’, ‘(1-2) mm: 26%, 0.94, 0.87’, ‘(2-3) mm: 15%, 0.90, 0.85’, ‘(3-4) mm: 8%, 0.85, 0.82’. We observed a crucial drop-off in performance outside a 2 mm radius of the true centres.

Conclusion:

Sensitivity to user input is vital in assessing the role of CAD in the clinical pathway for CSIs. The results indicate that guidance for user selection should be to target within 2 mm of the vertebrae centre and clinician training should account for this.

Limitations:

This is a develomental version of CSPINECAD.

Ethics committee approval

University of Exeter Ethics Committee approval.

Funding:

Funded by the EPSRC and Wellcome Trust.

7
RPS 1205 - Cascading model architecture of convoluted neural networks to improve the performance of pathology detection in digital chest x-rays

RPS 1205 - Cascading model architecture of convoluted neural networks to improve the performance of pathology detection in digital chest x-rays

05:40A. Kharat, Pune / IN

Purpose:

The chest x-ray is the most commonly used diagnostic examination screening tool in radiology. Due to constrained resources and the image-driven nature of a diagnosis, we want to assess if machine learning (ML) can aid in pathology detection.

Methods and materials:

The data source was a leading hospital (LH) of 800+ bed capacity. The dataset was anonymised and not publicly available. The dataset contained 26,000 x-rays and associated radiology reports but without pathology labels. Using findings in radiology reports, pathology labels were generated.

Results:

We trained a Densenet-121 model. Our objective was to build a customised ML model for the detection of any of 15 different chest diseases that are trained more effectively for chest x-ray source-specific datasets. Our data was imbalanced. Training a deep learning (DL) model for these pathologies can be challenging. Solely relying on balancing techniques cannot be a reliable way to improve performance. We explored cascading models. Along with 15 pathology prediction model for improved performance, we trained the model for binary classification to check if a chest x-ray has a pathology or not using a team of two validating radiologists. We used a Densenet-121 model trained without transfer learning. This binary model is combined with 15-pathology model in a cascading way. We found that the cascading architecture was more performant than using only the 15-pathology model on the LH dataset. We found that. at a small compromise for accuracy. the sensitivity (recall) of pathologies was better using this model.

Conclusion:

Using cascading architecture, we could improve Kappa, sensitivity/recall, and F1 score significantly for pathology detection in digital chest radiographs. Using the cascading model archiecture of CNN, radiologists can efficienctly manage and control the turn around time of the reporting process of digital radiographs by incorporating these techniques in workflow.

Limitations:

/a

Ethics committee approval

/a

Funding:

No funding was received for this work.

8
RPS 1205 - Development and performance comparison of multi-task deep learning approaches for the severity assessment of radiographic hip osteoarthritis features

RPS 1205 - Development and performance comparison of multi-task deep learning approaches for the severity assessment of radiographic hip osteoarthritis features

06:16C. von Schacky, Munich / DE

Purpose:

Radiographic features of hip osteoarthritis include joint space narrowing, osteophytes, subchondral sclerosis, and subchondral cysts. The aim was to develop, validate, and compare the performance of four multi-task deep learning approaches for grading radiographic hip osteoarthritis features.

Methods and materials:

We used 15,364 hip joints (7,738 pelvic radiographs) from subjects of the Osteoarthritis Initiative. Each hip joint was graded for five osteoarthritis features and each feature for absence, mild, moderate, or severe presence. The data was split 80%/10%/10% for training/validation/ testing. The hip joint was detected with a trained RetinaNet for object detection. Then, four different approaches to solve this multi-task problem were evaluated and compared: single-model alone, single-model with classifier, multi-task learning, and fine-tuned multi-task learning. All models were based on a Densenet-161. Reliability was assessed with Cohen’s Kappa.

Results:

All multi-task approaches achieved higher reliability compared to single-model approaches. The highest reliability was achieved with a fine-tuned multi-task learning model with a Kappa of 0.58 for femoral osteophytes, 0.41 for acetabular osteophytes, 0.51 for joint space narrowing, 0.44 for subchondral sclerosis, and 0.53 for subchondral cysts.

Conclusion:

This study shows that a variety of multi-task learning approaches can be used for grading radiographic hip osteoarthritis features and that multi-task learning might have advantages over single-model training.

Limitations:

The training set only consisted of cases of one large scale dataset from a longitudinal OA study. It would increase the quality of the training set to add further samples with different demographic and imaging characteristics.

Ethics committee approval

Written informed consent was obtained from all participants. IRB approval was granted by this HIPAA-compliant study by the four participating US-based centres.

Funding:

NIH-NIAMS: 5R01AR064771-05.

Watch ECR 2020 live

This session is part of ECR 2020 Live. Please register for ECR 2020 Live in order to get access.

  • ESR MEMBERS €350
  • NON MEMBERS €350

Speakers

Presenter

Arnaldo Mayer

Ramat Gan, Israel

Presenter

Claudio von Schacky

Munich, Germany

Presenter

Sunggyun Park

Seoul, South Korea

Presenter

Winnah Wu-in Lea

Seoul, South Korea

Presenter

Karen Knapp

Exeter, UK

Presenter

Guillaume Reichert

Paris, France

Presenter

Marius-Mihail Benta

Timisoara, Romania

Presenter

Amit Kharat

Pune, India