Research Presentation Session
07:07H. Yoo, Seoul / KR
Purpose:
To verify whether lung cancer detection can be improved if radiologists use an artificial intelligence (AI) algorithm in a chest x-ray (CXR) screen setting.
Methods and materials:The testing was based on the data from the ACRIN of the National Lung Screening Trial (NLST) (n=5491), a multicentre cohort of current and formerly heavy smokers. We estimated the performance of radiologists in detecting lung cancer diagnosed at baseline (T0) or first (T1) study year if the AI algorithm was used during initial radiologic evaluation and compared it with raw radiologist performance. The algorithm used in this study was a deep learning-based model (Lunit INSIGHT CXR1) trained from an external dataset, which produced an abnormality score that ranged from 0-100%, reflecting the probability that the lesion was malignant. A threshold abnormality score of 30% was adopted as the cut-off for classification. TNM stage, size, location, and overlapping anatomical areas in CXR of the lesion detected by radiologists and the algorithm were compared.
Results:Radiologists with the AI algorithm showed a significantly higher detection rate (73.5%(64.0%-83.0%)[61/83] vs. 53.0%(42.3%-63.7%)[44/83]; P<0.001) than without the algorithm. The lesions missed by radiologists but found by the algorithm mostly had no annotated abnormalities (6/17) or abnormalities that did not warrant further workup (8/17). There was no significant difference in TNM stage, size, location, and overlapping anatomical areas between lung cancer found with and without the algorithm.
Conclusion:The detection rate of the radiologists with AI significantly improved compared to raw radiologists. Using AI during lung cancer screening may lead to earlier diagnosis, due to the detection of nodules not suspected by radiologists.
Limitations:Because of retrospective design of our study, AI analysis was performed after radiologist reading.
Ethics committee approvaln/a
Funding:No funding was received for this work.
08:07T. Weikert, Basel / CH
Purpose:
To develop and evaluate Retina U-net algorithms with a different fore- and background balance for the detection of primary lung tumours of all stages and associated lymphatic and non-lymphatic metastases on FDG-PET-CTs.
Methods and materials:We selected 364 patients with histologically confirmed lung cancer that underwent FDG-PET/CT between 01/2010-06/2016. The dataset comprised tumours of all stages according to the 8th edition of TNM classification in lung cancer. To establish a standard of reference, all lung tumours (T), lymphatic metastases (N), and distant metastases (M) were manually segmented as 3D-volumes using the transverse, fused whole-body PET/CT series. The dataset was split in a training (n=216), validation (n=74), and testing dataset (n=74). We trained and validated three Retina-U-net-based algorithms on (A) solely T-lesions, (B) T&N-lesions, and (C) T,N&M-lesions. We evaluated the performance of the algorithms for detection of all lesions at multiple classifier-thresholds, calculating sensitivity, specificity, false-positive findings-per-case (FPC), area-under-the-ROC, and free-response-ROC-curves.
Results:We found very good detection rates of more than 90% for T3 and T4 lesions at low FPCs of 1.1 for the algorithm (C) trained on T,N&M-lesions. Performance of the algorithms (A) and (B) were worse. The detection rate of M-lesions in approach (C) was 72.3 %.
Conclusion:The algorithms presented in this study can serve as a foundation for automated detection of TNM-lesions in PET/CTs. This can reduce the rate of missed lesions. Training on all T,N&M-lesions yielded better results than training on solely T- or T&N-lesions.
Limitations:There was no distinction of T-/N- and M-lesions by the algorithm. 3D image segmentation for the establishment of ground truth was complete by only two readers.
Ethics committee approvalWritten informed consent was waived by the Swiss regional ethics committee (project number: 2016‑01649).
Funding:No funding was received for this work.
05:57S. Thulasi Seetha, Milan / IT
Purpose:
Automatic image segmentation through deep learning currently requires a laborious trial-and-error search for a bespoke network architecture. Our aim was to automate this search using an evolutionary algorithm we named gNEAT (gradient-based NEAT) and test it for lung cancer segmentation.
Methods and materials:gNEAT searched for a fitting neural network starting from a fixed, UNet-based, outer-skeleton and evolves the convolutional blocks within it. The computational complexity of training each candidate network through gradient descent was offset by using a proxy dataset (a resizing of the original images) during the search. By grouping similar networks as species, diversity was kept intact. gNEAT's incremental growth feature allowed a compact architectural search and removed the need for a customised initialisation. We also evaluated the influence of the model augmentation trick on the hand-designed and the model generated by gNEAT.
We tested the approach on the open-source lung-1 CT imaging data containing manually delineated tumours.
Results:The final model evolved by gNEAT yielded higher Dice scores than a through trial-and-error designed model (based on domain knowledge). Moreover, the augmented models have a better Dice score on both validation and test set.
Conclusion:gNEAT may replace the laborious but necessary task of hand-designing task-specific deep network architectures. The outcome also suggests that the performance of a model can be boosted by augmenting its depth.
Limitations:gNEAT is computationally complex and the generational approach taken can be difficult to parallelise efficiently.
Ethics committee approvaln/a
Funding:Funding from European Program H2020 (PREDICT - ITN - n° 766276).
06:03Q. Meng, Zhengzhou, China
05:51S. Lopez, Nice / FR
Purpose:
Is it possible to significantly reduce false-positive candidates while detecting all true-positive cases, considering prior chest CT-scans? What about detecting lung cancer earlier?
Methods and materials:In our retrospective study based on the NLST dataset, we included 1,102 patients with 3 annual chest CT-scans (T0, T1, and T2): 104 being diagnosed with cancer at T2 and 998 with no cancer evidence 3 years after the last CT-scan. We developed an AI algorithm based on neural networks using these 3 CT-scans to predict the cancer status.
Results:We developed an AI algorithm that detected all cancers at T2 (100% sensitivity).
Considering the CT-scan at T2 only, algorithm specificity is 64% compared to 22% for radiologists (+42%). When considering the prior (T0 and T1), AI specificity was 82% versus 58% for radiologists (+24%). Based on the data, this implies that the AI would reduce the number of unnecessary exams by half.
The same AI algorithm executed on scans at T1 was able to classify correctly 76% of the cancer-positive patients at T2, while radiologists screened positively 70% of them (+6%). AI specificity was measured at 80% versus 25% for radiologists (+55%) at T1.
Our preliminary results show that radiologists could be provided with reliable guidance based on AI to detect cancer earlier (76% and 64%, respectively, 1 and 2 years before).
The data indicates many patients with negative biopsies that then developed cancer 1 or 2 years later. The reasons remain unclear and further research should focus on automated nodule detection and guided biopsies.
Limitations:The NLST dataset is large but geographically limited to a North American screening population.
Ethics committee approvaln/a
Funding:This work was supported by UCAJEDI (French National Research Agency grant) ANR-15-IDEX-01.
05:50A. Paternain Nuin, Pamplona / ES
Purpose:
To determine the impact of two different reconstruction filters in the performance of a commercially available computer-aided detection (CAD) software for lung nodule detection.
Methods and materials:78 patients who underwent a low-dose chest CT scan for lung cancer screening were prospectively recruited. Each scan was read by a radiologist and by a CAD software (syngo.CT Lung CAD, Siemens Healthineers) using filter reconstructions for soft tissues (kernel A) and for lung parenchyma (kernel B). Nodules less than 3 mm in size were excluded from the analysis. The detection rate (DR) of both reconstructions was analysed according to the nodule size, considering a diameter of 6 mm as clinically relevant. The consensus between the radiologist and CAD software was set as the reference standard. Features like nodule composition, location, and relation to anatomical structures were also evaluated. Data analysis was performed with a Chi-square test and Student t-tests. A p-value <0.05 was considered statistically significant.
Results:301 lung nodules were detected. Overall, kernel A had significantly higher DR (65.1%) than kernel B (56.8%) (p<0.01), even considering the pulmonary nodules larger than 6 mm (DR of 85.7% vs 77.1%, respectively, p<0.01). There were no significant differences in the average number of false-positive results per scan (1.33 for kernel A vs. 1.00 for kernel B, p=0.21).
Conclusion:In the evaluated CAD system, the use of a soft tissue reconstruction filter allowed for higher detection rates than the lung parenchyma filter with a similar number of false-positive results, even for clinically relevant nodules.
Limitations:A small sample size.
Ethics committee approvalEthics committee approval obtained.
Funding:No funding was received for this work.
07:09V. Venugopal, New Delhi / IN
Purpose:
To compare the performance of deep learning algorithms for lung nodule detection against 3 independent radiologists of different experience levels (28 yrs, 20 yrs, and 14 yrs) on a retrospective chest CT dataset of 240 patients
Methods and materials:In a retrospective clinical validation study, three independent radiologists were required to annotate 240 chest CTs for all visible nodules between 5-30 mm in size. The annotations were compared with outputs of a deep learning algorithm (Predible Health) for nodule detection
All nodules were sub-classified as having consensus among 1, 2, or 3 radiologists. The deep learning algorithm was individually tested on nodule datasets with varying consensus levels. When testing on the 2/3 and 3/3 consensus dataset, we marked the 1/3 consensus nodules as irrelevant findings and did not penalise the algorithm for its detection.
Results:After considering for overlaps, there were a total of 146 nodules marked by the radiologists. The algorithm had a sensitivity of 96% (47/49) on nodules with 3/3 consensus and sensitivity of 91.1% (72/79) on nodules with 2/3 consensus, both at 1 false-positives per scan. The algorithm had an overall FROC score of 0.91 on nodules with 3/3 consensus and 0.86 on nodules with 2/3 consensus. Combining the algorithms’ findings with each radiologist' and comparing with the consensus of the other two as ground truth, we observe an increased sensitivity ranging from 5 to 20%.
Conclusion:This study demonstrates that lung nodule detection algorithms can improve the sensitivity of radiologists by as much as 20%, helping them report quicker.
Limitations:A possible improvement in study design if nodules without consensus can be adjudicated by an experienced thoracic radiologist.
Ethics committee approvalRequired Ethics Committee approvals were obtained for the study.
Funding:Funded by Predible Health.
11:30V. SAXENA, NEW DELHI / IN
Purpose:
To assess the performance of a deep learning-based malignancy likelihood estimation system trained on screening data from the National Lung Cancer Screening Trial (NLST) on a retrospective dataset of biopsy-proven studies from a large tertiary hospital in North India. The retrospective clinical validation dataset consisted of cases with primary cancers (64%), metastatic cancers (13%), and benign conditions (23%).
Methods and materials:The deep learning algorithm was trained on 1,245 scans from the NLST trial with pathologically proven ground truths to determine the malignancy status of a lung nodule. Retrospective data of 123 patients over 20 months who underwent CT-guided lung biopsy were chosen as suitable for the validation study. All patient studies were evaluated on follow-up scans to ensure conformance with biopsy results.
Results:The AI model showed a sensitivity of 75% (95% CI: 64%-83%) on 95 malignant nodules with a PPV of 86% (95% CI: 79%-90%). It is notable that 6 studies that appeared negative on histological findings were correctly predicted by the AI model as malignant, as confirmed by follow-up scans and re-biopsy. The AI model had a specificity of 58% (95% CI: 37%-76%) on 28 benign nodules with an NPV of 40% (95% CI: 29%-51%).
Conclusion:The algorithm can be positioned as an adjunct tool to the histology report that sometimes suffers an error rate due to erroneous sampling. Cancer screening settings only account for a limited context in training data due to specific inclusion criteria and hence further enhancement using a larger and more diverse training dataset is required before it can be used in routine practice.
Limitations:A studying mechanism of failure by the AI algorithm.
Ethics committee approvalEthics committee approved.
Funding:Funded by Predible Health.
04:49M. Colevray, Lyon / FR
Purpose:
Lung volume monitoring is essential to assess the evolution of pulmonary fibrosis in diffuse interstitial lung diseases (DILD), however, current automatic software often fails to accurately segment lungs.
We aimed to develop a deep learning-based automatic segmentation tool to quantify lungs parameters on unenhanced chest CT and validate the results in comparison with pulmonary function test (PFT) parameters.
Methods and materials:A 3D-U-net convolutional neural network was trained on 90 randomly selected unenhanced chest CT scans of patients presenting with DILD and validated on 50 others (sex ratio M/F=34/16, mean age=60±15.2 years). The mean Dice coefficient and root mean squared error (RMSE) were calculated on the validation dataset. The network was then applied on 1,171 CT exams (from 424 adult DILD patients, sex-ratio M/F=256/168, mean age=64.6±13.29 years) and the following parameters were calculated: total lung volume (CTvol), mean (CTdens), kurtosis (CTkurt), and skewness (CTskew) of lung density. Pearson correlation coefficients (r) and regression analysis were performed to assess the relationship between CT and PFT parameters (forced vital capacity (FVC), total lung capacity (TLC), and carbon monoxide diffusing capacity (DLCO)).
Results:On the validation dataset, the mean Dice was 0.97±0.01 (range 0.91-0.99) with an RMSE of 0.007 l.
The mean CTvol was 3.33±1.23 l with an overestimation of FCV (2.51±0.91 l) and an underestimation of TLC (4.18±1.34 l). Very good correlations were found between CTvol and FVC and TLC (r=0.87 and 0.85, respectively, p<0.0001). DLCO was poorly correlated with CT parameters with r-values of -0.31, 0.48, and 0.51 (p<0.0001) for CTdens, CTkurt, and CTskew, respectively.
Conclusion:Our deep learning-based automated segmentation method allows for the accurate quantification of lung volume on chest CT with a good correlation with FVC and TLC.
Limitations:n/a
Ethics committee approvaln/a
Funding:No funding was received for this work.
05:30M. Rossius, Rotterdam / NL
Purpose:
A shortage of radiologists, combined with the progression of radiological training toward cross-sectional exams, has combined to create a growing gap in the time to CXR reporting. An automatic chest x-ray (CXR) triage tool could mitigate the clinical consequences of interpretation delays. One common corollary sign of acute disease is the presence of pleural effusion. Our purpose was to validate an AI based CXR-pleural effusion detection algorithm.
Methods and materials:488 consecutive CXRs were included in the analysis. Each study was scored algorithmically for the detection of pleural effusion. A gold standard was established independently by an expert radiologist with 10 years of clinical experience.
Results:105/488 cases showed pleural effusion. The algorithm achieved 92.4% sensitivity and 90.1% specificity. In this series, the corresponding positive predictive value was 71.9% and the negative predictive value was 97.7%.
Of note, 64/105 of the positive cases were accompanied by other findings of acute disease such as consolidation. 13/38 of the false-positive results were, in fact, positive for acute consolidations. All 8 false-negative cases involved small pleural effusions.
Conclusion:We observed an excellent capability for automatic detection of pleural effusion, which in this random sample was highly correlated with other findings of acute cardiopulmonary disease. Further investigation and development is warranted to accomplish more comprehensive triage capability for chest x-rays.
Limitations:The retrospective nature of the study. A gold standard based on single chest x-ray interpretation.
Ethics committee approvalInstitutional Review Board approval was obtained.
Funding:No funding was received for this work.
05:50S. Primakov, Maastricht / NL
Purpose:
Localising and delineating lung tumours is essential for radiotherapy planning and various quantitative imaging workflows. However, manual contouring is highly laborious and time-consuming, as well as being prone to variability and poor reproducibility. To address these issues we created a fully automated pipeline for detecting and segmenting lung tumours on CT images.
Methods and materials:Multicentric CT images from 1,043 NSCLC patients with expert delineations of the gross tumour volume were used to train, test, and validate our detection and segmentation method. A three-step approach was developed, consisting of data pre-processing, lung isolation, and tumour segmentation. A pre-processing algorithm was developed with regard to hardware and acquisition parameters. A 2D U-net type convolutional neural network with test-time augmentation and volumetric post-processing was trained on 936 CT scans. We evaluated model performance on the remaining 107 scans by using a Dice similarity coefficient (DSC), Jaccard index (J), and 95th Hausdorff distance (H95th). In addition, we implemented the RECIST and volumetric-RECIST (VRECIST) functionality.
Results:On the external validation dataset, we achieved a median slice-wise detection accuracy of 0.99 (IQR=0.01), a specificity of 0.99, and a sensitivity of 0.90. Generated segmentations achieved an average DSC of 0.81, median of 0.88 (IQR = 0.12), average J of 0.71, median of 0.78 (IQR = 0.19), and H95th of 6 mm.
Conclusion:The proposed pipeline can potentially provide a low-cost, observer-independent, and reproducible method for the detection and segmentation of lung cancers on CT images. Moreover, it can be used for automated tumour response evaluation using RECIST or VRECIST.
Limitations:The ground truth depended on the quality of doctor's contours.
Ethics committee approvaln/a
Funding:European Program H2020 (PREDICT-ITN-766276); KWF Kankerbestrijding 12085/2018-2.