From Imaging to Outcomes: A PI-RADS–Driven Radiomics and Clinical Machine Learning Model for Detecting Clinically Significant Prostate Cancer
Author Block: D. Samaras1, G. Agrotis2, M. Vakalopoulou3, M. Vlychou1, I. Tsougos1; 1Larissa/GR, 2Amsterdam/NL, 3Paris/FR
Purpose: This study aimed to develop and evaluate a machine-learning (ML) framework based on the PI-RADS protocol for detecting clinically significant prostate cancer (csPCa) using multiparametric MRI (mpMRI), simulating radiologists’ decision-making.
Methods or Background: The publicly available PI-CAI (Prostate Imaging Cancer AI) dataset was employed, comprising 1,500 cases from 1,476 patients across 11 centers using seven MRI scanners. Among these, 1,075 cases were benign or clinically insignificant prostate cancer (cinsPCa), while 425 represented csPCa, defined as Gleason score (GS) ≥ 3+4. Ground truth labels were derived from biopsy results conducted by urologists, radiologists, or trained medical personnel under supervision. The ML framework followed a two-branch architecture: T2-weighted (T2W) images for the transition zone, and diffusion-weighted imaging (DWI) with apparent diffusion coefficient (ADC) maps for the peripheral zone. In addition to a radiomics-only model, a combined radiomics + clinical model was developed incorporating PSA, age, and prostate volume. Feature extraction was performed using Pyradiomics, including shape and texture features from original and filtered images. Feature space dimensionality was progressively reduced through a multi-stage pipeline: low-variance filtering (threshold 0.01), Pearson correlation pruning (ρ≥0.85), and Wilcoxon rank-sum testing (p≤0.1), followed by supervised feature selection restricted to training folds. The dataset was split into 80% training/validation and 20% testing, with five-fold cross-validation. Performance metrics included AUC, sensitivity, specificity, accuracy, balanced accuracy, and F1-score.
Results or Findings: The combined model (radiomics+clinical) outperformed the radiomics-only model, achieving higher AUC in both training (0.79±0.02 vs. 0.76±0.02) and testing set (0.78 vs. 0.73).
Conclusion: Our approach demonstrates strong potential for improving csPCa detection, supporting biopsy decisions, and enhancing patient outcomes.
Limitations: The modest external test set and absence of deep learning benchmarks limit generalizability. Validation on larger, multicenter cohorts and integration into clinical workflows are warranted.
Funding for this study: This work has been partially supported by project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the NextGenerationEU Program.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: