Research Presentation Session: Genitourinary Hot Topic with Keynote Lecture

RPS 1907 - Hot Topic: AI-assisted prostate cancer diagnosis

March 7, 12:30 - 13:30 CET

10 min
Keynote Lecture
Binit Sureka, Jodhpur / India
6 min
Improving biopsy decision-making in prostate cancer using AI-assisted biparametric MRI
Silvia Bottazzi, Rome / Italy
Author Block: S. Bottazzi, A. Iacono, I. Isufi, L. D'Erme, S. Persiani, G. Avesani, L. Russo, E. Sala, B. Gui; Rome/IT
Purpose: To evaluate the biopsy benefit-to-harm ratio of biparametric MRI (bpMRI) alone versus AI-assisted bpMRI in a decision-support setting.
Methods or Background: This retrospective study included patients undergoing pre-biopsy MRI between 2021 and 2024. The MRIs were analysed in two sequential phases with a 45-day washout period. Initially, five readers (2 experts and 3 basics according to ESUR-ESUI) independently assigned PI-RADS on bpMRI. After, AI-generated reports were available, and readers reassessed bpMRI with AI decision support. Benefit-to-harm metrics were biopsy selectivity (GG≥2/GG1) and biopsy efficiency (GG≥2/[GG1+benign]). They were calculated across two different biopsy thresholds: PI-RADS ≥3 and a combined threshold of PI-RADS ≥4 or PI-RADS 3 with PSA density (PSAd) ≥0.15 ng/ml². Metrics were computed per reader and summarised as overall and subgroups (experts and basics readers) means (±SD).
Results or Findings: AI assistance improved benefit-to-harm metrics overall. At PI-RADS ≥3, biopsy selectivity increased from 3.889±0.884 to 5.078±1.602 and efficiency from 1.688±0.404 to 1.992±0.053. At the composite threshold (PI-RADS ≥4 or PI-RADS 3 with PSAd ≥0.15 ng/ml²), selectivity increased from 4.317±0.735 to 5.400±1.188 and efficiency from 2.166±0.699 to 2.363±0.250. Basic readers showed the greatest benefit: at PI-RADS ≥3, efficiency reached 2.000 and NNB 0.50; at the composite threshold, efficiency reached 2.260, with higher selectivity at both thresholds. Among experts, selectivity improved at PI-RADS ≥3 (from 4.688 to 5.378) and at the composite threshold (from 4.989 to 5.833), while efficiency remained stable.
Conclusion: Across the evaluated thresholds, the composite criterion—combining PI-RADS with PSAd—provided the highest biopsy selectivity and efficiency. With AI decision support, basic readers achieved performance comparable to experts, while expert metrics remained stable (efficiency) or slightly improved (selectivity). These findings support AI-assisted bpMRI interpretation to improve biopsy decision-making.
Limitations: Retrospective study with limited number of readers
Funding for this study: Na
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Study ID7680
6 min
Artificial intelligence or radiologist interpretation for prostate cancer diagnosis
Alexander Ng, London / United Kingdom
Author Block: A. Ng, A. Asif, A. Shah, A. Dudko, R. Kumar, P. Rajwa, F. Giganti, D. Pendse, V. Kasivisvanathan; London/UK
Purpose: With the global incidence of prostate cancer predicted to double in the next 20 years, alongside the potential adoption of biparametric MRI and national MRI screening programmes, demand for prostate MRI is set to rise substantially. Interpretation, however, has a steep learning curve, with optimal performance achieved by expert genitourinary radiologists. With a rising demand for medical imaging and a projected 40% radiologist shortfall by 2027, a prompt international solution is warranted. PARADIGM aims to evaluate whether artificial intelligence (AI) is non-inferior to radiologists in detecting clinically significant prostate cancer (Gleason grade group ≥ 2).
Methods or Background: PARADIGM is an international, prospective, multicentre, non-inferiority, within-patient, level-1 evidence diagnostic study. 500 men will be recruited over 18 months. Men will undergo standard of care MRI with either 1.5 or 3.0 T with at least a pelvic phased array coil. The radiologist and a primary AI algorithm will report the MRI blinded to each other. The radiologist will then be unblinded and produce a merged report, with the ability to overrule AI findings for safety. Suspicious lesions identified by either AI or radiologist will undergo targeted biopsies, with optional perilesional and/or systematic biopsies. The primary outcome is the proportion of men with clinically significant cancer. Planned secondary outcomes include the proportion of men with clinically insignificant cancer (Gleason grade group 1), and test performance characteristics of AI and radiologists.
Results or Findings: 45 centres across 14 countries expressing interest are undergoing pre-trial MRI quality control. PARADIGM will open to recruitment in Q1 2026.
Conclusion: PARADIGM will provide the first prospective, level-1 evidence on the diagnostic performance of AI in the detection of clinically significant prostate cancer on MRI.
Limitations: PARADIGM will not investigate workload reductions. Approvals are in progress.
Funding for this study: The PARADIGM trial is supported by The John Black Charitable Foundation and the European Association of Urology Research Foundation.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The PARADIGM trial has not yet been approved by an ethics committee. Regulatory approvals are currently underway and results of ethics committee approvals will be shown during the presentation if received by date of presentation.
6 min
AI-accelerated versus conventional diffusion-weighted imaging for prostate MRI: A head-to-head comparison of quality and quantitative metrics
Vlad Sacalean, Freiburg Im Breisgau / Germany
Author Block: V. Sacalean1, O. Gebler1, W. Liu2, R. Strecker2, E. Weiland2, F. Bamberg1, J. Weiß1, M. Russe1, H. Engel1; 1Freiburg Im Breisgau/DE, 2Erlangen/DE
Purpose: Diffusion-weighted-imaging (DWI) is central to prostate MRI but prolongs examinations. We compared an AI-accelerated reduced-field-of-view readout-segmented EPI diffusion sequence (AI-DWI) with a conventional full-field readout-segmented sequence (c-DWI), hypothesising shorter acquisition with non-inferior perceived diagnostic quality and characterising effects on diffusion metrics.
Methods or Background: This prospective, single-center study of diagnostic accuracy enrolled 62 consecutive men with elevated PSA-levels between March and May 2025. The AI-DWI sequence was compared against the standard c-DWI sequence for each patient. Three radiologists independently scored overall image quality, anatomic differentiability, lesion conspicuity and artefacts on five-point Likert scales. Quantitative analysis involved comparing mean Apparent Diffusion Coefficient (ADC) and seven additional texture features (maximum, minimum, standard-deviation, coefficient of variation, entropy, skewness and kurtosis) using a five-millimetre region of interest in each index lesion. Wilcoxon-signed-rank tests assessed ordinal scores, and paired t-tests were used for quantitative metrics.
Results or Findings: The AI-DWI sequence demonstrated a significantly shorter acquisition time compared to c-DWI (3 min 59 s vs. 4 min 21 s; p < 0.01). There was no significant difference in subjective scores for overall image quality, lesion conspicuity, artefacts, or anatomic differentiability (p > 0.05 for all). AI-DWI yielded significantly lower mean ADC values (975.92 ± 174.57 vs. 1013.21 ± 189.34; adj. p < 0.01) and maximum ADC values (adj. p < 0.01). No significant differences were found for standard deviation, coefficient of variation, entropy, kurtosis, minimum, or skewness (adj. p > 0.05).
Conclusion: The AI-DWI sequence allows for a meaningful reduction of acquisition time while preserving excellent subjective image quality compared to the c-DWI. Quantitatively, it yields lower mean and maximum ADC values, while showing no significant differences in the rest of the quantiative metrics.
Limitations: Single-centre design.
Lack of histopathological correlation.
Funding for this study: The research MRI sequence was provided under an unrestricted collaboration agreement between Siemens Healthineers and the Department of Diagnostic and Interventional Radiology, Medical Center – University of Freiburg. Siemens Healthineers AG provided technical support; study conception and design, as well as data analysis and interpretation, were conducted independently.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study protocol was approved by the Ethics Committee of the Albert-Ludwigs-Universität Freiburg (Approval #22-1185). All participants provided written informed consent prior to inclusion. This prospective study consecutively enrolled 64 men referred for multiparametric prostate MRI between March and May 2025; two were excluded due to incomplete or severely degraded studies, yielding a final cohort of 62 participants.
6 min
In Prostate Cancer Diagnosis, Deep Learning Model’s Performance Declines as MRI Quality Reduces, while Radiologists’ Performance Remains the Same
Eduardo H. P. Pooch, Amsterdam / Netherlands
Author Block: E. H. P. Pooch, G. Agrotis, S. Ursprung, A. Dehghanpour, R. Beets-Tan, T. Janssen, I. G. Schoots; Amsterdam/NL
Purpose: To assess the impact of biparametric MRI (bpMRI) scan quality, as determined by PI-QUAL scores, on the diagnostic performance of a deep learning (DL) model and radiologists in correctly detecting Grade Group (GG) ≥2 prostate cancer in prostate cancer suspected men.
Methods or Background: A nnU-Net GG≧2 cancer segmentation model used 1500 bpMRI scans for training, 1000 for testing (PI-CAI cohort), and 573 scans for external validation (PROMIS cohort). The external cohort analysis included MRI assessment by a radiologist (R1) using PI-RADS v2.1 and the original PROMIS study Likert scores as the second assessment (R2). Two readers (QR1, QR2) assessed the image quality of the PROMIS MRI scans and determined a consensus Prostate Imaging Quality (PI-QUAL) v2 score. The reference standard was GG≧2 cancer, confirmed by transperineal saturation biopsy. The model’s and radiologists’ diagnostic performance (AUCs) were compared. Bootstrap testing (1000 iterations) was used to calculate 95% Confidence Intervals (CI) and determine the statistical significance of the performance differences between quality subgroups.
Results or Findings: On the external dataset, readers' performance achieved AUC[R1] 0.85 [0.82-0.88] and AUC[R2] 0.76 [0.73-0.79], respectively. The DL model’s performance on reduced-quality scans (n=141) declined (AUC[DL-PIQUAL1] 0.63 (0.53-0.71), whereas on high-quality scans (n=432), performance increased (AUC[DL-PIQUAL2-3] 0.71 [0.66-0.75] (p<0.05)). In contrast, the readers’ performance remained consistent across scan quality (AUC[R1-PIQUAL1] 0.88 [0.82-0.93] and AUC[R1-PIQUAL2-3] 0.91 [0.88-0.93] (p=0.35); AUC[R2-PIQUAL1] 0.79 [0.71-0.86] and AUC[R2-PIQUAL2-3] 0.80 [0.76-0.84] (p=0.49).
Conclusion: The diagnostic performance of the AI model declined significantly on reduced-quality MRI scans as determined by PI-QUAL v2 scoring, whereas radiologists maintained consistent performance regardless of scan quality.
Limitations: Single external cohort, single AI model, subjective quality evaluation
Funding for this study: No
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
AI in the Hot Seat: Can a Commercial AI tool Triage Prostate MR using a dedicated 3-point scale?
Francesco Giganti, London / United Kingdom
Author Block: F. Giganti; London/UK
Purpose: To evaluate a commercial AI algorithm's ability to triage prostate MRI into actionable risk categories for clinically significant prostate cancer (csPCa), benchmarked against radiologist-assigned PI-RADS scores.
Methods or Background: Pathology reports from targeted biopsies of 39 consecutive biopsy-naive men with MRI studies at Rolling Oaks Radiology (CA, USA) were used as ground truth. Radiologists assigned PI-RADS scores in their original reports.

Imaging was analysed by the DeepHealth algorithm, which automatically segmented the prostate and lesions using a 3-point scale: Low (unlikely csPCa, avoid biopsy); Medium (likely csPCa, consider biopsy); High (likely csPCa, biopsy recommended).

CsPCa was defined as grade group ≥2. AI predictions were dichotomised using High alone or High+Medium as positive. PI-RADS thresholds of ≥3, ≥4, and 5 were compared. Patient-level analysis used the highest scoring lesion.
Results or Findings: Twenty patients (51%) had csPCa. Studies included four scanner manufacturers (GE, Siemens, Philips, Hitachi) with 3T, open, and hybrid PET/MR systems.

For ≥Medium risk, the AI sensitivity was 89% (69–97%) and specificity was 45% (26–66%). PI-RADS ≥3 and ≥4 yielded 100% (83–100%) and 95% (75–99%) sensitivity, with specificities of 25% (11–47%) and 50% (30–70%). The PPV for ≥Medium risk was 61% (42–76%), comparable to that of PI-RADS ≥3 (56%, 39–71%) and ≥4 (64%, 46–79%).

High risk achieved 47% (27–68%) sensitivity and 100% (84–100%) specificity, versus PI-RADS 5 with 42% (23-64%) sensitivity and 95% (76-99%) specificity. AI's High risk had 100% PPV (70–100%) with 67% NPV (49–81%).
Conclusion: This pilot study demonstrates the feasibility of AI-based prostate MRI stratification using a 3-point scale, with classifications aligned with expert PI-RADS scores. Further validation is ongoing.
Limitations: Small sample size and a case-level analysis only.
Funding for this study: None
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information:
6 min
Artificial intelligence-assisted reading of non-contrast prostate MRI: Application and concordance with expert interpretation in a screening population within the PROSA Trial
Emanuele Messina, Rome / Italy
Author Block: E. Messina, A. Borrelli, L. Laschena, A. Dehghanpour, M. Pecoraro, V. Panebianco; Rome/IT
Purpose: Bi-parametric MRI (bpMRI), a non-contrast imaging approach, has been explored as potential method for screening clinically significant prostate cancer (csPCa). At the same time, artificial intelligence (AI) is increasingly recognized as potential supportive tool. This study aimed to assess the performance of an AI-based software for csPCa screening with bpMRI, focusing on its value in assisting less-experienced readers.
Methods or Background: Retrospective analysis of the PROSA trial, a prospective, randomized, single-center study that enrolled 759 men eligible for csPCa screening. BpMRI scans were obtained following PI-RADS v2.1 guidelines and independently reviewed by an expert radiologist, a less-experienced reader, AI-software, and the less-experienced reader assisted by AI. Diagnostic accuracy was evaluated through ROC curve analysis and inter-reader agreement (Cohen’s kappa), with the expert’s assessment serving as reference standard.
Results or Findings: Out of 499 bpMRI scans, the less-experienced reader supported by AI achieved the best diagnostic performance (sensitivity 76.5%, specificity 97.2%, accuracy 95.8%, AUC 0.868), outperforming both AI-alone (sensitivity 58.8%, specificity 96.6%, accuracy 94.0%, AUC 0.777) and the unaided less-experienced reader (sensitivity 67.6%, specificity 95.1%, accuracy 93.2%, AUC 0.814). AI support also enhanced inter-reader agreement (κ=0.84), reducing the number of PI-RADS 3 cases (77→53), and increased exact concordance with the expert from 32.5% to 54.5%, while lowering diagnostic discordance.
Conclusion: AI has the potential to assist less-experienced radiologists and improve the consistency of bpMRI readings, especially considering equivocal cases. In addition, its integration into radiology workflows may reduce reporting workload and facilitate prioritization of suspicious findings, providing important benefits in large-scale screening programs.
Limitations: Reference standard: expert reader’s assessment (not histopathology), since only MRI-positive cases undergo biopsy; histology for all would be unfeasible in screening.
AI-software trained mostly on older, clinically suspected patients, not younger screening population.
Funding for this study: No
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Local EC
6 min
AI-supported prostate MRI lesion classification: impact at different reader experience levels
Hüdanur Bayraktaroglu, Munich / Germany
Author Block: H. Bayraktaroglu, M. Heimer, P. Lohse, J. F. Kuhnle, R. Lorbeer, C. Stief, J. Ricke, C. C. Cyran, P. M. Kazmierczak; Munich/DE
Purpose: To investigate the impact of a commercially available AI tool for prostate lesion classification according to PI-RADS v2.1 on the diagnostic accuracy of radiologists at different experience levels.
Methods or Background: In this IRB-approved retrospective single-center study, prostate MRI datasets of 477 patients (470 mpMRI, 7 bpMRI; median age 68 years) acquired at 3 Tesla between 09/22 and 11/24 were analysed. Three blinded radiologists with different experience levels (low, intermediate, high) independently assessed the datasets in a two-step approach: first without AI support, then after reviewing AI results with the option to change the initial PI-RADS score. Scoring differences were tested using the McNemar-Bowker test and agreement with the reference standard (board-certified radiologist, >12 years of experience in abdominal imaging) was quantified by weighted Cohen’s k.
Results or Findings: Across all readers, AI support led to redistribution of PI-RADS scores, most pronounced in the least experienced reader (p=0.002), including upscoring of PI-RADS 2 lesions: ten to PI-RADS 3, five to PI-RADS 4, six to PI-RADS 5 [21/194, 10.8%]. In 7/11 of the upscored cases, TRUS-guided transperineal fusion biopsy confirmed csPCa (ISUP ≥2). AI improved agreement with the reference standard in all readers, reaching statistical significance in the low experience reader (Cohen’s k from 0.54 [0.46-0.61] to 0.59 [0.52-0.66], p=0.009), whereas changes for the other readers did not prove statistically significant (p=0.143 and p=0.731). Overall, agreement of the AI results with the reference standard remained moderate (k=0.59 [0.52-0.66], p<0.001).
Conclusion: AI support significantly improved diagnostic accuracy of low, but not of intermediately or highly experienced readers. However, in view of the moderate agreement of the AI results with the reference standard, supervision by an experienced uroradiologist remains mandatory.
Limitations: No limitations were identified.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the local institutional review board (project nr.: 24-0945).
6 min
Evaluation of an AI algorithm’s performance in MRI follow-up of patients undergoing active surveillance for prostate cancer
Juliette Spilleboudt, Rennes / France
Author Block: S. JULIETTE1, C. Ruppli2, G. Herpe3, D. Bouda4, G. D'Assignies2, L. Beuzit1; 1Rennes/FR, 2Paris/FR, 3Poitiers/FR, 4Nantes/FR
Purpose: The PRECISE scoring system is a recently developed tool designed to standardize MRI follow-up in patients undergoing active surveillance (AS) for prostate cancer. This study aimed to evaluate the performance of a deep-learning–based artificial intelligence (AI) software prototype in assessing the likelihood of radiological tumor progression on serial MRI scans using the PRECISE criteria. The performances of the AI prototype and a junior radiologist were compared to those of an expert radiologist, who served as the ground truth.
Methods or Background: A total of 96 patients undergoing active surveillance were included, each with two available MRI scans. For each patient, prostate lesions were detected, measured and classified according to the Pi-RADS 2.1 guidelines by a junior radiologist, an expert radiologist, and the AI algorithm. The PRECISE score was independently assessed by each radiologist, while the AI algorithm inferred the score based on predefined calculation rules. Balanced accuracy was calculated using a threshold of PRECISE score 3 (i.e., scores ≤3 vs >3), comparing the predictions of the junior radiologist and the AI software to those of the expert radiologist.
Results or Findings: The difference between AI protype and junior radiologist PRECISE scores inferences was not statistically significant. AI prototype inferred PRECISE scores with non significant lower accuracy than the junior radiologist. Using a threshold at PRECISE 3, the balanced accuracy was 0.67 for the junior reader and 0.62 for the AI (p-value 0.44).
Conclusion: The AI prototype inferred PRECISE scores with lower accuracy than a junior radiologist, though the difference was not statistically significant. However, AI prototype demonstrated superior performance in lesion detection and segmentation.
Limitations: The ground truth is assessed by a single expert radiologist. Our study is also limited by its retrospective design and single-center setting.
Funding for this study: No funding
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: