Research Presentation Session: Artificial Intelligence and Imaging Informatics

RPS 105 - Exploring the AI-radiomics interface in lung cancer

March 4, 08:00 - 09:30 CET

6 min

Evaluation of the influence of human factors towards a new AI-CADe/x system for lung cancer risk stratification in the radiology workflow

Andrew Scarsbrook, Leeds / United Kingdom

Author Block: C. Santos¹, M. Santos¹, J. H. R. Cairns², A. Johnstone², M. Darby², C. Arteta¹, A. Scarsbrook²; ¹Oxford/UK, ²Leeds/UK
Purpose: Integration of AI-based computer aided detection (CADe) and computer aided diagnosis (CADx) tools could help improve and standardize CT reporting and increase efficiency. However, clinical benefit may in part depend on user’s attitudes towards AI technologies. The effect of human factors on a new AI-CADe/x tool for assisting CT-based lung nodule risk stratification was evaluated.
Methods or Background: A Multiple Reader Multiple Case (MRMC) study assessed the effect of an AI-CADe/x tool on readers estimates of lung cancer likelihood from CT scans. This facilitated pulmonary nodule detection, segmentation, and lung cancer risk prediction. Twelve radiologists evaluated 240 cases (95 cancers), with and without AI-assistance. Technology acceptance was modelled using a 5-point Likert Scale survey assessing Perceived Ease of Use (PEO) and Usefulness (PU), attitudes towards AI and trust. These were correlated with mean effect size (ES), determined via the Dorfman-Berbaum-Metz method (z-test used for difference between ES; 95% confidence intervals reported).
Results or Findings: The AI tool increased user reporting confidence, median survey score (interquartile range) was 4 (4-4.25), being highly rated for PEO (4 (4-5)) and PU (4 (4-5)). Participants expressed greater trust towards CADe (4 (3-4)) than CADx (3 (2-4)) components. ES was affected by levels of trust, automation, aversion and confirmation biases. Participants without tendency for confirmation bias (n=5), i.e. who considered the AI suggestion even when it challenged their initial assessment, showed a significantly higher ES from AI-assistance (p=0.028) than those exhibiting bias, with an ES of 6.25% [-1.95, 8.35] versus 3.02% [2.02, 6.31], respectively.
Conclusion: Attitudes towards AI influence radiologist interaction with AI-based CADe/CADx tools. Positive attitudes and greater trust resulted in improved benefits from AI assistance for lung cancer risk-stratification.
Limitations: Preliminary technology acceptance modelling across twelve radiologists.
Funding for this study: The study was jointly funded by the National Institute for Health and Care Research (NIHR) and the Office for Life Sciences (OLS) under project ID NIHR207547.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Commercially available AI products for CT-based lung cancer screening: capabilities, clinical evidence, and alignment with international screening frameworks

Noa Antonissen, Nijmegen / Netherlands

Author Block: N. Antonissen¹, S. S. Schalekamp¹, H. Hahn², K. G. Van Leeuwen³, C. Jacobs¹; ¹Nijmegen/NL, ²Bremen/DE, ³Utrecht/NL
Purpose: To evaluate certified capabilities of CE-marked AI products for lung nodule analysis in CT lung cancer screening, assess their alignment with international screening frameworks, and analyze their supporting peer-reviewed evidence.
Methods or Background: Six core clinical tasks (nodule detection, classification, measurement, growth assessment, malignancy risk estimation, structured management) were defined by analyzing four major screening frameworks: the Lung CT Screening Reporting and Data System (Lung-RADS) version 2022, British Thoracic Society (BTS) guidelines, European Union Position Statement (EUPS) recommendations, and European Society of Thoracic Imaging (ESTI) nodule management recommendations. CE-marked AI products were identified through the Health AI Register. Vendors confirmed certified capabilities using a standardized questionnaire; public documentation supplemented non-responders. Scientific evidence was evaluated using a six-level efficacy framework and assessed for study characteristics.
Results or Findings: In total, 16 products from 16 vendors were included, and 10 vendors completed the questionnaire. Regarding capabilities, 14 products detect and measure solid and subsolid nodules, 12 support growth assessment, and none support endobronchial or cystic lesion evaluation. For risk estimation, 9 products provide malignancy-risk outputs: 5 use the PanCan model and 4 provide proprietary AI-based scores. Six products had no peer-reviewed publications. The remaining 10 products were supported by 60 studies, of which 7% were prospective and 45% vendor-independent, with external testing including multicenter (40%), multinational (7%), and screening cohort (40%) datasets. Overall, evidence was concentrated at lower efficacy levels: 70% assessed standalone diagnostic accuracy, 25% examined effects on diagnostic decision-making, and none reported patient or societal outcomes.
Conclusion: CE-marked AI solutions fulfill core functions for nodule assessment but lack certified capabilities for endobronchial and cystic lesions and are supported by limited prospective, higher-level clinical evidence.
Limitations: Product capability assessment relied on publicly available and vendor-reported data.
Funding for this study: This project is co-funded by the SOLACE project, funded under the EU4Health Programme 2021–2027 under grant agreement no. 101101187.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Diagnostic performance of Harrison.ai CT chest algorithm for detection of lung nodules using the DUKE Lung Cancer Screening Dataset

Melissa Ryan, Sydney / Australia

Author Block: M. Ryan¹, M. Steele¹, G. Sanderson¹, J. Tang²; ¹Sydney/AU, ²Victoria/AU
Purpose: To evaluate the performance of an artificial intelligence (AI) algorithm for pulmonary nodule detection on low-dose CT chest scans to determine the clinical utility of AI-based nodule detection as a decision support tool in lung cancer screening.
Methods or Background: Low-dose CT chest scans (n=1,060), available at https://doi.org/10.5281/zenodo.10782890, as part of the DUKE lung cancer screening dataset were evaluated using the Harrison.ai CT chest device. The dataset represents a cohort of screening CT scans from a high-volume contemporary lung cancer screening center with associated Lung-RADS scores, and where applicable, histologic type and stage of future lung cancer.

All images were analysed by the device, which automatically identifies cases with identified nodules, as well as localisation of each nodule instance within an image series. Patient-level and instance-level sensitivity were evaluated. Stratified performance based on gender and malignancy was also performed.
Results or Findings: Preliminary results of 1,037 cases that were successfully processed demonstrated patient-level sensitivity of 91% and instance-level sensitivity of 62%, with an average of 1.12 false positive instances per case. False positive rate may be in part due to AI detection of nodules below the datasets minimum size threshold (4 mm).
Conclusion: The Harrison.ai CT chest device demonstrated excellent diagnostic performance at the case level which supports the potential role for AI in enhancing efficiency and reliability in lung cancer screening programs using low-dose CT.
Limitations: The retrospective design and reliance on a single institutional dataset limit generalisability. The algorithm’s performance may vary with differing scanner protocols, populations, and nodule characteristics not fully represented in the DUKE dataset.
Funding for this study: No funding was received for this study
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

AI-Assistance Improving Lung Nodule Risk Assessment in Lung Cancer Screening: Results from RELIVE, A Multi-Reader Multi-Case Study to Evaluate the Performance of an end-to-end AI/ML-based CADe/CADx

Charles Michael Voyton, Valbonne / France

Author Block: C. M. Voyton, P-H. SIOT, B. Renoust, B. Huet, V. Bourdes; Valbonne/FR
Purpose: Evaluation of LDCT exams for lung cancer screening (LCS) is time intensive, and nodule risk stratification is challenging and prone to false positives. This multi-reader multi-case (MRMC) study evaluated the improvement in radiological review and nodule risk assessment with the support of an end-to-end AI/ML-based CADe/CADx.
Methods or Background: In this paired split plot (PSP) design MRMC, 480 retrospective LCS patients (160 cancer, 320 non-cancer) and 16 US board-certified radiologists were split into 4 balanced blocks of 120 patients and 4 radiologists respectively; each case within each block was read twice (with and without the assistance of the AI/ML-based CADe/CADx) with at least 28 days of washout in between. Readers were asked to locate, measure and assign a 1-100 risk score per nodule as well as assign a recommended patient management.
Results or Findings: AUROC (AUC) and specificity (sp) with the support of the AI (AUC: 0.843, sp: 0.700) significantly increased (p < 0.05) versus without (AUC: 0.828, sp: 0.648) demonstrating LCS report improved diagnostic performance. Less experienced radiologists (2-6 years of experience) showed the largest increase in AUC of 4.77% (p = 0.0308). The inter-reader agreement was also significantly improved (p<0.0001) for readers using the device with an increase in intraclass correlation coefficient (ICC) from 0.707 to 0.830.
Conclusion: The AI system enhanced radiologists’ diagnostic accuracy for pulmonary nodule malignancy risk assessment in a LCS cohort. These findings support its potential role as a key decision-support tool in clinical practice, especially for early career clinicians.
Limitations: Retrospective study on an enriched cohort that does not fully reflect real-world cancer and nodule characteristic prevalences.
Funding for this study: This study was funded by Median Technologies

On Behalf of the REALITY/RELIVE Investigators: (participating centers and investigators: Baptist Cancer Center, Memphis, TN, USA – Raymond Osarogiagbon; Clínica Universidad de Navarra, Madrid, Spain – Luis Seijo, Gorka Bastarrika; Penn Medicine, Philadelphia, PA, USA – Anil Vachani; Hospital Universitario Fundación Jiménez Díaz, Madrid, Spain – Carolina Gotera; The University of Texas MD Anderson Cancer Center, Houston, TX, USA – Edwin Ostrin, Jennifer Dennison).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: clinicaltrials.gov number: NCT06751576

6 min

A 'Grounded Theory' exploration of stakeholders' opinions of Radiology AI for Prioritised Imaging and Diagnosis of lung cancer (the 'RAPID-LC' study)

Clare Rainey, Cork / Ireland

Author Block: C. Rainey¹, A. Gill², S. L. Mcfadden²; ¹Cork/IE, ²Belfast/UK
Purpose: Lung cancer is a leading cause of death globally and projected to reach 3·55 million deaths by 2050. Currently many patients in Northen Ireland (NI) are identified at advanced stages leading to a poor survival rate. AI has been proposed to assist clinicians, however the perceptions of the clinician and patient need to be considered . This will ensure that AI is used optimally and designed with the end-user in mind.
Methods or Background: A grounded theory methodology was adopted. 1:1 interviews were conducted with members of the public (patients) and clinicians (two nurses, an oncologist, a medical physicist, a radiographer and an anesthetist) currently working in the lung cancer pathway in NI. Data were transcribed verbatim and Braun and Clarkes thematic analysis used.
Results or Findings: Seven themes and eight subthemes emerged from the data.
Patients trusted the clinicians’ scrutiny of the AI system whilst the clinician required robust ongoing validation and education. The public/patient participants are unconcerned about the cost implications of AI.
Both groups were positive about using AI to streamline care/reduce waiting times. They were more cautious in its use for symptomatic presentations.
Whilst results are optimistic, various concerns exist around (i) trust, (ii) communication, (iii) validation (iv) infrastructure. These remain central to discussions around AI adoption in NI and internationally.
Conclusion: Clinicians require robust and transparent validation, while the public are happy to trust the clinician. Neither group feels that autonomous AI is acceptable currently. Both groups feel strongly that AI is needed to support healthcare systems. Education and co-design have been proposed to facilitate adoption.
Limitations: This study was conducted with participants from NI, where the lung cancer pathway may differ slightly from the rest of Europe.
Funding for this study: SBRI Research grant
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: UU Nursing and Health Research Filter Committee

6 min

A preliminary study: Homology-based detection of lung cancer lesion in CT images

Kentaro Doi, Osaka / Japan

Author Block: K. Doi, H. Numasaki, Y. Anetai, M. Imai, Y. Natsume-Kitatani; Osaka/JP
Purpose: We proposed the homology-based feature (HF) map to pulmonary quantify fibrotic lesions in CT images (Doi, et al. IJCARS 2025). However, lung cancer lesions reduce this quantitatively performance. This study addressed this issue by detecting lung cancer lesions prior to the quantification of pulmonary fibrosis through the optimizing the HF map calculation.
Methods or Background: We utilized publicly available forty CT images and the corresponding radiotherapy structure datasets from the cancer image archive. These datasets include the coordinate information of lung cancer lesions defined by radiologists, so the coordinate information was utilized as a ground truth. HF maps were obtained from the collected CT images, and predicted lung cancer lesions were generated by the threshold processing for HF maps. This method can be regulated by hyperparameters such as the tile size, tile shift size, range of CT value for analyses, and threshold value of HF map. Here, these hyperparameters were optimized using the DICE coefficient as an objective function, which can indicate similarity between a predicted lung cancer lesion and a ground truth.
Results or Findings: The DICE coefficient has achieved 0.93 by optimized hyperparameters as followings: the tile size, 16 pixels; tile shift size, 2; range of CT value for analyses; -800 to 100; and threshold value of HF map, 85% of the maximum value of HF maps. The range of CT values differs from that of the quantification of fibrosis, which might be caused by differences in those CT values.
Conclusion: We have demonstrated the potential of the HF map to detect lung cancer lesions in CT images.
Limitations: This methodology will be optimized for each lung cancer type related to the CT value. Moreover, more deeply validation of this study using validation datasets is required.
Funding for this study: This work has been supported by JSPS KAKENHI [grant number 24K23909] and the Japan Science and Technology Agency COI-NEXT [grant JPMJPF2018].
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

In-vivo classification between lung adenocarcinoma and squamous cell lung cancer through late fusion of a clinical model and an MRI radiomic model

Frédérique Frouin, Orsay / France

Author Block: J-E. Al Khoury¹, C. Nioche¹, M. Lacroix², M. Hamadouche³, C. Suchel³, P-Y. Brillet³, F. Frouin¹; ¹Orsay/FR, ²Paris/FR, ³Bobigny/FR
Purpose: Improving a classification model between lung adenocarcinoma (ADK) from squamous cell lung cancer (SQc) through the integration of clinical and MRI data.
Methods or Background: Clinical and MRI radiomic logistic regression models were developed from a training/validation set of 80 patients with advanced lung cancer (59 ADK and 21 SQc). A late fusion model combining the probabilities of the two models was defined. The three models were tested on 50 patients (30 ADK and 20 SQc). Clinical data were recorded including age, weight, BMI, smoking status, and TNM staging. Following brain MRI to discover possible brain metastases, a contrast-enhanced T1-weighted (CET1w) chest MRI was performed. After preprocessing of CET1w MRI, including intensity normalization and spatial resampling (1 mm3 isotropic voxels), the lung tumor was delineated in 3D. Radiomic features including radial intensity measures were extracted using LIFEx software (version 25.9.4). Model training was based on a forward selection of features that maximises balanced accuracy.
Results or Findings: Only one parameter (age) was selected for the clinical model. Only one RIM feature (mean gradient calculated between the mean intensities of successive layers within the tumor) was selected for the radiomic model. Based on a five-fold cross-validation framework, the balanced accuracy was 0.62 for clinical, 0.61 for radiomic, and 0.68 for late fusion models in the validation phase. When applied to the test set, the balanced accuracies were 0.62 for clinical, 0.62 for radiomic, and 0.67 for late fusion models.
Conclusion: The late fusion of the clinical model and the MRI radiomic model shows improved results for the classification task (ADK compared to SQc) when compared to each individual model. All three models were robust and showed stable performance in the test set.
Limitations: No limitations were identified
Funding for this study: Technical and logistic help from GE HealthCare
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: IRB protocol 32-2016, study number: 2016-A00813-48

6 min

Benchmarking lung tumour segmentation models: stratified performance of deep learning models across tumour sizes and cancer stages

Alonso Cerrato Nieto, Utrecht / Netherlands

Author Block: A. Cerrato Nieto, E. Scholten, S. S. Schalekamp, M. Prokop, C. Jacobs; Nijmegen/NL
Purpose: Accurate lung tumour segmentation in CT scans is crucial for staging, radiotherapy planning, and treatment monitoring. Published deep learning models for lung tumour segmentation show varying consistency across tumour stages and may fail in complex tumours. This study compared five publicly available models with an in-house model trained to be robust to variation in tumour sizes and cancer stages.
Methods or Background: A dataset of 588 CT scans from lung cancer patients (2006-2020) was retrospectively collected and annotated at Radboud University Medical Center. A deep learning model was trained using the nnU-Net architecture on subsets of Radboud patients (n=505), the NSCLC-Radiomics dataset (n=362) and the Medical Segmentation Decathlon dataset (n=56). Our model was compared with five publicly available models, including the Universal Lesion Segmentation baseline model, the Medical Segmentation Decathlon lung model, DuneAI, TotalSegmentator and nnInteractive. Segmentation accuracy was assessed using volumetric and boundary metrics, including stratified analyses by tumour size (≤30 mm, >30–50 mm, >50–70 mm, >70 mm) on our internal test dataset (n=83).
Results or Findings: Our proposed model performed equal to or superior to the best public models regarding volumetric Dice scores (median ≥0.87), showing an increase ranging from 0.01 to 0.28, depending on the model and the tumour size group. The model demonstrated substantially improved consistency with interquartile ranges ≤ 0.10 for all tumour sizes. It also achieved higher surface Dice together with lower Hausdorff distance, indicating improved tumour border accuracy. Performance remained superior in clinically demanding cases with cavities, local invasion, and large masses.
Conclusion: Our model improves performance and robustness over prior models across tumour sizes, including challenging cases. It represents a promising step towards automated evaluation of lung tumours.
Limitations: Independent validation in larger multicentre datasets is required.
Funding for this study: Public private consortium with funding from NWO, Dutch Ministry of Economic Affairs, and MeVis Medical Solutions, Bremen, Germany.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The institutional review board waived the need for informed consent because of the retrospective design and data pseudonymization.

6 min

Benchmarking of Artificial Intelligence and Radiologists for Indeterminate Lung Nodule Malignancy Risk Estimation on Screening CT: Results of the LUNA25 Challenge

Bogdan Obreja, Nijmegen / Netherlands

Author Block: D. Peeters¹, B. Obreja¹, N. Antonissen¹, Z. Saghir², U. Pastorino³, G. De Bock⁴, R. Vliegenthart⁴, M. Prokop¹, C. Jacobs¹; ¹Nijmegen/NL, ²Hellerup/DK, ³Milan/IT, ⁴Groningen/NL
Purpose: Accurate risk classification of indeterminate (5-15mm) lung nodules can reduce unnecessary follow-up in lung cancer screening. AI may assist in risk classification, however, benchmarking studies are limited. Here, we present the results of the LUNA25 challenge, a public competition that evaluates AI and radiologist performance for malignancy risk estimation of indeterminate nodules at screening CT.
Methods or Background: LUNA25 consists of an AI study and a reader study. For AI development, participants had access to a public dataset of 4069 CT scans from the National Lung Cancer Screening Trial (NLST), with 555 malignant and 5608 benign nodules. AI evaluation was performed on an external test set with 156 malignant and 312 benign indeterminate solid and part-solid nodules from baseline scans of the Danish (DLCST), Dutch-Belgian (NELSON), and Italian (MILD) lung cancer screening trials. For the reader study, radiologists assessed 300 nodules from the test set, assigning each a malignancy risk score (0–100) and management recommendation (low, intermediate, or high-risk). Performance was compared using area under the ROC curve (AUC), sensitivity, and specificity.
Results or Findings: On the subset of 300 nodules, the top-performing AI system showed a statistically superior AUC of 0.78 (95% CI :0.73-0.84, p<0.001) in comparison to the average AUC of 75 readers with an AUC of 0.69 (95% CI :0.64-0.74). At the ≥ indeterminate risk threshold, the AI correctly classified 12% more malignant cases at matched specificity, and 20% fewer false-positives at matched sensitivity.
Conclusion: The top-performing AI system demonstrated statistically significant superior performance compared to the average radiologist in estimating malignancy risk for indeterminate lung nodules detected on screening CT, highlighting its potential use as a decision-support tool.
Limitations: LUNA25 only benchmarks AI’s stand-alone performance and does not address workflow integration or radiologist-AI interaction.
Funding for this study: This work was supported by a public-private research projectwith funding from the Dutch Research Council (NWO), the Dutch Ministry of Economic Affairs, and MeVis Medical Solutions (Bremen, Germany), as well as by a public-private project with funding from the Dutch Cancer Society (KWF Kankerbestrijding, project number 9037) and Siemens Healthineers, as well a project with funding from the Dutch Cancer Society (KWF Kankerbestrijding, project number 14113).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethical approval for the training set was granted with the NLST trial receiving institutional review board approval at all 33 centers participating in the trial. In addition, informed consent was provided by all participants involved in the trial. Access to this dataset was granted through the National Cancer Institute's Cancer Data Access System (CDAS) under project number NLST-74, NLST-111, NLST-164 and NLST-267.
Ethical approvals for the testing set were obtained from the Ethics Committee of Copenhagen County (DLCST), the institutional review board of Fondazione IRCCS Istituto Nazionale Tumori di Milano (MILD), and the Dutch Minister of Health with support from the Dutch Health Council (NELSON), along with authorization from the Ethical Boards of participating centres.

6 min

Can traditional radiomics help in ultra-low dose early lung cancer diagnosis?

Anna Mrukwa, Gliwice / Poland

Author Block: A. Mrukwa¹, M. Socha¹, J. Polanska¹, R. Dziadziuszko²; ¹Gliwice/PL, ²Gdańsk/PL
Purpose: Lung cancer is the deadliest cancer in the world due to the usually late diagnosis. If diagnosed early on, during the screening programs, many patients could be saved. Nodules in LDCT scans are miniscule and hard to spot, so it is important to create a system to assist with malignancy assessment while retaining method explainability. Radiomic features were designed for diagnosis in advanced stages, thus may not be suitable for this case.
Methods or Background: Nodules were extracted from Pilot Pomeranian Lung Cancer Screening Program using MiMSeg on lungs without bronchovascular bundles. Only changes meeting radiomics calculation size requirements are considered, disqualifying 49 of them. 363 malignant nodules were used and 1054 benign. Pyradiomics radiomic features and additionally introduced sphericity and spikularity are evaluated with Kruskal-Wallis test and post hoc analysis using Dunn’s test.
Results or Findings: Effect size between groups is medium at most, whereas post hoc analysis showed that the most different group is calcification, with the suspicious changes not distinct from other groups. Thus, binary multilayer perceptron trained on the features achieved balanced accuracy of only 54.9% and 53.3% for medium effect features. For multiclass classification, balanced accuracy is 22.8% for all features, 25.7% for selected. This improvement comes from calcification (over half correctly classified) and is tradeoff for others (accuracy for suspicious dropping from 0.189 to 0.057).
Conclusion: Traditional methods do not perform well with LDCT and do not provide a reliable method for early lung cancer diagnosis. Additionally, the smallest nodules are not suited for this analysis type. Thus, introduction of Deep Learning methods seems to give the biggest promise.
Limitations: No limitations were identified.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Evaluating research quality of lung cancer CT radiomics using RQS 1.0 and METRICS

Maurizio Balbi, Turin / Italy

Author Block: M. Balbi¹, V. Giannini¹, N. C. Culasso², A. Defeudis³, A. U. Cavallo⁴, A. Stanzione⁵, A. Ponsiglione⁵, R. Cuocolo⁶, A. Veltri¹; ¹Turin/IT, ²Orbassano/IT, ³Candiolo/IT, ⁴Rome/IT, ⁵Naples/IT, ⁶Baronissi/IT
Purpose: To assess the methodological quality of lung cancer CT radiomics studies using the Radiomic Quality Score (RQS) 1.0 and the METhodological RadiomICs Score (METRICS).
Methods or Background: Lung cancer CT radiomics studies published up to December 31, 2024, were scored by 6 human readers of different expertise using RQS 1.0 and METRICS. Median scores were computed, and differences across clinical aim, publication year, and patient number were assessed using the Kruskal-Wallis test. In cases of significant differences, pairwise Mann–Whitney U tests were performed to evaluate group-wise comparisons.
Results or Findings: A total of 834 studies were included. Median METRICS and RQS 1.0 percentage scores were 0.63 (IQR, 0.51–0.72) and 0.36 (IQR, 0.52–0.71), respectively. The main methodological limitations included the lack of external validation or any validation strategy, the unavailability of open data, and the retrospective, single-center study design. Median METRICS scores of studies published in 2024 (0.67) and 2023 (0.69) were significantly higher than those published before 2023 (≤0.65; all p < 0.04). Similarly, median RQS 1.0 scores of studies published in 2024 (0.42) were significantly higher than those previously published (≤0.39; all p < 0.01). Studies including >100 patients showed significantly higher METRICS and RQS 1.0 scores than those with ≤100 patients (all p < 0.001). Differences across clinical aims were observed, with patterns varying between METRICS and RQS 1.0.
Conclusion: Research quality in lung cancer CT radiomics was rated as good by METRICS while RQS 1.0 yielded comparatively lower scores. Both tools highlighted an improvement in quality over time and higher scores in studies involving >100 patients.
Limitations: -Lack of reproducibility analyses.
-Lack of evaluation according to RQS 2.0 (published after data collection and analysis).
Funding for this study: No funding.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Comparative analysis of multiple AI software packages on detection performance of the reference nodule for lung cancer screening application

Xiaotong Ouyang, Groningen / Netherlands

Author Block: X. Ouyang¹, K. Togka², D. Han², I. Schuldink², B. Jiang¹, C. Van Der Aalst¹, H. J. De Koning¹, M. Oudkerk²; ¹Rotterdam/NL, ²Groningen/NL
Purpose: Over the past five years, artificial intelligence(AI) for pulmonary nodule detection on Computed Tomography(CT) has rapidly advanced, and multiple commercial AI software packages are now available. However, independent validation—particularly at the reference nodule level—remains limited. This study compares four CE-certified commercial AI software packages for reference nodule detection on low-dose CT, aiming to provide scientific evidence for their application in lung cancer screening(LCS).
Methods or Background: A total of 560 baseline LDCT scans were consecutively selected from three Dutch 4-IN-THE-LUNG-RUN(4ITLR) centers following the trial’s protocol. Expert radiologists established consensus reference standards. Four CE-certified(three FDA-cleared) commercial AI packages(A–D) were evaluated using an internal automatic platform. Finally, 554 participants were divided into two cohorts per the NELSON 2.0-European Position Statement by reference standard: positive(77 with the largest nodule ≥100 mm³) and negative(477 with the largest nodule <100 mm³ or no nodules). AI detection in the positive cohort was classified as “Correct positive” or “Negative misclassification.” Subgroup analyses were performed by nodule volume, pulmonary-lobe location, and attachment. Logistic regression was applied to examine their effect on AI detection.
Results or Findings: Correct positive rates were: Software A: 60/77(77.9%), B: 71/77(92.2%), and C: 59/77(76.6%), D: 66/77(85.7%). Software A showed increasing detection with nodule volume, while D showed negative coefficients. Both C and D each missed 2/6 masses. All packages performed better for right-lung nodules and non-attached nodules, with significant inter-software differences (p<0.05). Logistic regression confirmed higher detectability for right-sided and non-attached nodules (log-odds>0).
Conclusion: Despite high reported performance in the literature, commercial AI packages show variable detection in a low-dose CT lung cancer screening dataset. Reference nodule detectability is influenced by pulmonary location, nodule volume, and attachment, highlighting the need for independent, comprehensive validation before and during LCS application.
Limitations: Not applicable
Funding for this study: Not applicable
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:

6 min

Automated CT Quantification of Interstitial Lung Abnormalities and Prognostic Associations in Lung Cancer Screening

Pardeep Vasudev, London / United Kingdom

Author Block: P. Vasudev, M. Azimbagirad, B. Selvarajah, A. Bhamani, S. Aslani, D. Alexander, *. SUMMIT consortium, S. Janes, J. Jacob; London/UK
Purpose: Interstitial lung abnormalities (ILA) are increasingly recognised in lung cancer screening cohorts and may indicate early fibrotic change. Their prognostic importance has been shown, but large-scale reproducible assessment remains challenging. Automated CT-based quantification offers a scalable solution. The aim was to determine whether automated measures are associated with mortality and whether prognostic performance improves with expert radiologist ILA diagnosis.
Methods or Background: This was a retrospective analysis of a multi-centre prospective lung cancer screening cohort. Low-dose chest CT scans from 4,411 participants were included, with ILA status determined by expert radiologist assessment: 3,622 had no ILA and 789 had ILA (17.9% of participants). An automated pipeline segmented the lungs, airways, and vessels, and quantified abnormal parenchymal density by normalising CT intensities and applying mean- and standard deviation-based thresholds to generate density maps. Multivariable Cox proportional hazards regression assessed associations between parenchymal density metrics, expert radiologist ILA diagnosis, and mortality, adjusting for age, sex, smoking history, and forced vital capacity (FVC). Hazard ratios (HR) with 95% confidence intervals (CI) and p-values were reported.
Results or Findings: Abnormal high-density lung volume was independently associated with increased mortality (HR 1.03, 95% CI 1.01–1.04, p=2.04×10⁻³). Prognostic associations strengthened when expert radiologist ILA diagnosis was incorporated (HR 1.03, 95% CI 1.01–1.04, p=9.72×10⁻⁹). Significant covariates included age (HR 1.08, 95% CI 1.06–1.11, p=8.18×10⁻¹⁵), sex (HR 0.60, 95% CI 0.45–0.79, p=2.21×10⁻⁴), and FVC (HR 0.97, 95% CI 0.97–0.98, p=1.42×10⁻¹⁰). Smoking history was not statistically significant.
Conclusion: Automated CT quantification of parenchymal density abnormalities was associated with mortality in lung cancer screening participants. Prognostic associations were stronger when expert radiologist ILA diagnosis was combined with automated measures, supporting their use as objective biomarkers for further validation.
Limitations: No limitations were identified.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Ethical approval for the lung cancer screening study was obtained from a NHS Research Ethics committee (17/LO/2004) and the NHS Health Research Authority’s Confidentiality Advisory Group (18/CAG/0054).