AI vs senior radiologists in detecting thoracic abnormalities on chest radiographs compared to CT
Author Block: S. Bennani1, J. Ventre1, V. Marty1, E. Lacave1, D. Hayashi2, A. Kompel3, A. Gupta3, A. Guermazi4, N-E. Regnard1 ; 1Paris/FR, 2Stony Brook, NY/US, 3Boston, MA/US, 4West Roxbury, MA/US
Purpose: The study aimed to assess the diagnostic performances of an artificial intelligence (AI) software in the detection of thoracic abnormalities on chest radiographs compared to senior radiologists.
Methods or Background: We collected 319 chest radiographs of patients above 22 years old who underwent thoracic CT within 72 hours. A senior chest radiologist annotated the radiographs for four abnormality types (pleural abnormality, consolidation, mediastinal-hilar abnormality, nodule) using CT findings as the ground truth. Three senior radiologists independently analysed the dataset, knowing clinical indications without CT access. Discrepancies were resolved by consensus.
The AI (ChestView, Gleamer), a deep learning algorithm that detects the four abnormalities, was compared against the radiologists and their consensual analysis for sensitivity and specificity.
Results or Findings: The dataset included 168 radiographs (age: 64±16 years, 91 women): 129 with at least one abnormality, 39 without any abnormality.
For consolidation, the sensitivities were 72% for AI, 54%, 80%, and 66% for the individual readers, and 71% for consensus, with specificities of 92% for AI, 80%, 77%, 85% for the readers, and 92% consensus. The sensitivities for mediastinal-hilar abnormalities, were 54% (AI), 43%, 27%, 48% (readers), and 54% (consensus); specificities were 95%, 88%, 96%, 93%, and 94%, respectively. For nodules, the sensitivities were reported as 67% for AI, 57%, 55%, 55% for the readers, and 62% consensus, with specificities of 89%, 52%, 88%, 81%, and 86%, respectively. Lastly, for pleural abnormalities, the sensitivities were 84%, 89%, 87%, 73%, and 83%, and the specificities were 93% for AI, 88%, 92%, 95% for the readers, and 94% for consensus.
Conclusion: The AI consistently matched or exceeded senior radiologists in detecting thoracic abnormalities.
Limitations: The dataset had few radiographs with high pathology prevalence.
Funding for this study: Funding for this study was received from Gleamer.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by WCG (number IRB00000533).