Improving the generalisation of radiographic AI using automated data curation to mitigate shortcut learning
Author Block: I. A. Selby1, E. González Solares1, A. Breger2, M. Roberts1, J. Babar1, F. J. Gilbert1, N. Walton1, C-B. Schönlieb1, J. R. Weir-Mccall3; 1Cambridge/UK, 2Vienna/AT, 3London/UK
Purpose: To investigate whether automated data curation pipelines for chest radiographs can improve deep-learning model performance on unseen data.
Methods or Background: Two public datasets, MIDRC-1A and MIDRC-R1, were used to develop diagnostic COVID-19 models using four architectures (DenseNet121/ResNet152V2/VGG16/EfficientNetB3). Each was trained four times using a different data curation workflow: WF1. Raw pixel data with partitioning stratified on dataset and COVID-19 status; WF2. DICOM-cleaned data with look-up tables applied, lateral projections and non-chest radiographs excluded, classes balanced on Manufacturer and Projection tags, and partitioning additionally stratified on the same metadata; WF3. Cases excluded using an open-source data-cleaning pipeline (AutoQC, https://gitlab.developers.cam.ac.uk/maths/cia/covid-19-projects/autoqc). Partitioning was stratified on projection and the presence of a pacemaker using AutoQC annotations; and WF4. The previous two workflows combined. COVID-19 diagnosis was inferred from laboratory tests, and model performance was assessed using five other public datasets. Generalisation from internal-to-external data was quantified using ΔAUCs.
Results or Findings: 43,176 radiographs were included in WF1, with 33.2% (14,328) being COVID-19-positive. The development sets of the other workflows were up to 60% smaller. Similarly, the external test sets ranged from 24,563-to-38,417 patients, depending on workflow. The WF1 models experienced the largest fall in generalisation (mean ΔAUC = -0.15 [95%CI:-0.17,-0.14]), while models trained utilising AutoQC (WF3-4) demonstrated the most consistent performance with mean ΔAUCs = -0.04 [95%CI:-0.06,-0.02] and -0.02 [95%CI:-0.04,0.00] for WF3 and WF4 (p<0.05). The WF2 models had a mean ΔAUC = -0.07 [95%CI:-0.09,-0.05].
Conclusion: Automated data curation can improve the generalisation of deep learning models for chest radiographs, facilitating more consistent performance on data from new locations and equipment.
Limitations: Future work should evaluate the tool in multiclassification tasks and non-COVID-19 datasets. In addition to the current pacemaker detection, tools for a broader range of support apparatus are necessary.
Funding for this study: The authors wish to acknowledge support from the EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking - DRAGON (101005122) (I.S., A.B., M.R., L.E.S., J.B., C.-B.S., E.S., J.W.M., AIX-COVNET); the National Institute for Health and Care Research (NIHR) Cambridge Biomedical Research Centre (BRC-1215-20014) (I.S., L.E.S., J.H.F.R., E.S., J.W.M.); Wellcome Trust (J.H.F.R.), British Heart Foundation (J.H.F.R.); the EPSRC Cambridge Mathematics of Information in Healthcare Hub EP/T017961/1 (M.R., J.H.F.R., C.-B.S.); Cancer Research UK (CRUK) National Cancer Imaging Translational Accelerator (NCITA) [C42780/A27066] (L.E.S.); Cambridge Mathematics of Information in Healthcare (CMIH) Hub EP/T017961/1; Austrian Science Fund (FWF, project T-1307) (A.B.); and the Trinity Challenge BloodCounts! project (M.R., C.-B.S.). The AIX-COVNET collaboration is also grateful to Intel for financial support.
C.B.S. additionally acknowledges support from the Philip Leverhulme Prize, the Royal Society Wolfson Fellowship, the EPSRC advanced career fellowship EP/V029428/1, EPSRC grants EP/ S026045/1 and EP/T003553/1, EP/N014588/1, EP/T017961/1, the Wellcome Innovator Awards 215733/Z/ 19/Z and 221633/Z/20/Z, the European Union Horizon 2020 research and innovation program under the Marie Skodowska-Curie grant agreement No. 777826 NoMADS, the Cantab Capital Institute for the Mathematics of Information and the Alan Turing Institute.
Please note that the content of this publication reflects the authors’ views and that neither IMI nor the European Union, EFPIA, or the DRAGON consortium are responsible for any use that may be made of the information contained therein.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The Brent Research Ethics Committee, the Health Research Authority (HRA), and Health and Care Research Wales (HCRW) provided ethical approval for our retrospective study (IRAS ID: 282705, REC No.: 20/HRA/2504, R&D No.: A095585). Informed consent was not required as data was pseudonymised.