Impact of population size and validation method on the performance of radiomics models: application to COVID lung lesions
Author Block: A. Decoux, L. Duron, A. Arnoux, L. S. Fournier; Paris/FR
Purpose: The purpose of this study was to explore the impact of population size and validation strategy on the estimated performance and reproducibility of radiomics studies.
Methods or Background: Radiomics parameters were extracted from lung lesions segmented by experts on CT in 3,737 COVID-19 patients (STOIC cohort). 1121 (33%) patients were set aside to simulate an external validation population ("generalisation set"). Among the remaining patients, subpopulations of varying sizes were generated to simulate the training/test population for the radiomics study. Prediction models were trained on 100 bootstrapped samples to estimate variance of the AUC, i.e. model stability. Three validation strategies were tested: one time split, cross-validation and nested cross-validation. The mean and variance of AUCs of each model was calculated on the subpopulation as well as on the "generalisation" set, and the difference was the generalisation gap.
Results or Findings: Increasing the size of the training data sets improved model performance on both internal validation and generalisation sets, decreased the variance of performance on the validation set and decreased the generalisation bias, thereby increasing overall confidence in the model, with a plateau at 400 patients.
Cross-validation helped reduce variance and generalisation bias compared to one time split. Nested cross-validation reduced variance but at the expense of increased generalisation bias.
Conclusion: As expected, population size has a strong impact on model performance, particularly on the estimated performance variance (stability) of models. This study is the first to estimate the minimum population size needed to improve generalisability of radiomics studies. However, as it is applied to a single data set, results are expected to vary according to predictive power of imaging for a given clinical question.
Limitations: Our generalisation set serves as a surrogate for an external validation set, it doesn't constitute a true external validation set.
Funding for this study: This work was funded by the French government under management of the Agence Nationale de la Recherche as part of the "Investissements d'avenir" programme, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This is a retrospective study.