Research Presentation Session: Artificial Intelligence & Machine Learning & Imaging Informatics

RPS 605 - Impact of AI on diagnosis of prostate cancer

February 28, 16:30 - 17:30 CET

7 min
Artificial intelligence and radiologists at prostate cancer detection in MRI: outcomes of the PI-CAI challenge
Anindo Saha, Nijmegen / Netherlands
Author Block: A. Saha, J. Bosma, J. J. Twilt, D. Yakar, M. Elschot, J. Veltman, J. Fütterer, M. De Rooij, H. Huisman; Nijmegen/NL
Purpose: Diagnostic performance of AI systems at detecting clinically significant prostate cancer in MRI, in comparison to radiologists using PI-RADS v2.1, has not been studied at scale. Autonomous AI systems can alleviate the increasing demand in medical imaging and reduce overdiagnosis in prostate cancer management.
Methods or Background: We trained, tuned, and tested an independently developed AI system at detecting Gleason grade group ≥2 prostate cancer, using a retrospective cohort of 10,207 MRI examinations (9129 patients) from four European tertiary care centres. In parallel, we facilitated an observer study with 62 radiologists (45 centres, 20 countries) and 400 testing cases. Reference standard was histopathology and ≥3 years of follow-up. Our study design was established and preregistered with 16 international multidisciplinary experts.
Results or Findings: In the subset of 400 testing cases that was used to facilitate the observer study, on average, the AI system demonstrated superior diagnostic performance with an AUROC of 0.91 (95% CI, 0.87-0.94), than the pool of 62 radiologists at PI-RADS v2.1 with an AUROC of 0.86 (95% CI, 0.83-0.89). In all 1000 testing cases, the AI system showed marginally lower specificity of 68.9% (95% CI, 65.3-72.4%) than the standard of care during routine practice with a specificity of 69.0% (95% CI, 65.5-72.5%), when thresholded to match the same sensitivity of 96.1% (95% CI, 94.0-98.2%) as the PI-RADS ≥3 operating point.
Conclusion: An AI system, trained on thousands of cases, is superior in differentiating significant prostate cancer at MRI in comparison to radiologists at PI-RADS v2.1, but marginally less specific in comparison to the standard of care in routine practice.
Limitations: The study utilised a retrospective design and was based on histologic verification guided by routine practice. There was an absence of intercontinental, multi-ethnic patient data and MRI examinations from all major commercial vendors.
Funding for this study: This study received funding from the EU Horizon 2020: ProCAncer-I (grant number 952159), Health~Holland (grant number LSHM20103).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by institutional or regional review boards at each contributing center (identifiers: REK 2017/576; CMO 2016-3045; IRB 2018-597; ZGT23-37), and was conducted in accordance with the principles of the Declaration of Helsinki. Informed consent was exempted, given the retrospective scientific use of deidentified MRI scans and clinical data.
7 min
Data integration using AI, PI-RADS, and clinical data to reduce false positives in prostate MRI
Antony William Rix, Cambridge / United Kingdom
Author Block: A. W. Rix1, P. Burn2, N. Vasdev3, A. Bradley4, A. Andreou5, J. Aning6, T. Barrett1, A. R. Padhani7, A. Shah8; 1Cambridge/UK, 2Taunton/UK, 3Stevenage/UK, 4Truro/UK, 5Bath/UK, 6Bristol/UK, 7Northwood/UK, 8Winchester/UK
Purpose: This study aimed to determine how multi-modal decision support models, integrating clinical data, PI-RADS, and AI, could help optimise patient selection for biopsy following MRI for suspected prostate cancer.
Methods or Background: Clinical history, MRI, PI-RADS, and histopathology data were obtained retrospectively from a five-site, multi-vendor study of a diagnostic patient population. 352 patients were assigned for model training/ tuning, and 235 patients (Grade Group≥2 prevalence 34%) for held-out testing. GG≥2 cancer was verified by standard-of-care MRI-directed biopsy. Patients scored PI-RADS 1/2 without biopsy were considered negative. Automated AI-based software that identifies and scores patients/ lesions for risk of GG≥2 was separately trained using the same training data. Multi-modal machine learning models were trained for combinations of AI scores, clinical variables including PSA-density (PSAD), and the original reporting radiologists’ PI-RADS scores. Sensitivity, specificity, and AUC were compared per-patient on the held-out test data with the PI-RADS assessments and AI scores alone.
Results or Findings: The original PI-RADS scores identified GG≥2 patients with sensitivity 1.00 (95% CI 1.00-1.00), specificity 0.67 (0.61-0.75) and AUC 0.94 (0.91-0.97). AI detected GG≥2 patients with sensitivity 0.97 (0.93-1.00), specificity 0.55 (0.47-0.62) and AUC 0.88 (0.84-0.92) using bpMRI data. Combining AI scores and PSAD based on TZ volume (TZ-PSAD) gave sensitivity 0.95 (0.90-0.99, p<0.001), specificity 0.70 (0.63-0.77, p<0.001) and AUC 0.90 (0.85-0.93, p=0.25). Combining PI-RADS, AI, and TZ-PSAD gave sensitivity 0.99 (0.96-1.00, p<0.001), specificity 0.83 (0.77-0.89, p<0.001), and AUC 0.96 (0.93-0.98, p=0.003). TZ-PSAD gave slightly better AUC than whole-prostate PSAD. Other clinical variables had no statistically significant benefit. Findings with bpMRI and mpMRI AI were similar.
Conclusion: Decision support models combining PI-RADS, AI scores, and PSAD could significantly reduce false positive biopsies while maintaining sensitivity, compared to AI or PI-RADS assessments alone.
Limitations: This study used standard-of-care limited biopsy for the ground truth.
Funding for this study: Funding was received from Lucida Medical.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved with the UK HRA IRAS number: 278640.
7 min
DL algorithm for MRI prostate volume can automatically tailor the threshold of PSA density in combination with other risk factors for the prediction of clinically significant PCa
Alessandro Venturi, Florence / Italy
Author Block: A. Venturi1, A. Colarieti2, D. Fazzini1, M. Interlenghi1, E. Schiavon1, M. Alì1, I. Castiglioni1, S. Papa1, F. Sardanelli2; 1Milan/IT, 2San Donato Milanese, Milan/IT
Purpose: The aim of our study was to determine the optimal threshold for Prostate-Specific Antigen density (PSAd) when the prostate volume is automatically computed by a deep learning (DL) algorithm on T2-weighted MRI images as a contouring method. This standardised, repeatable, and reliable predictor was then combined with other predictors of clinically significant prostate cancer (csPCa).
Methods or Background: We conducted a multicentric retrospective study, including patients assessed by mpMRI prior to prostate biopsy. csPCa was defined as a PCa with any ISUP grade group ≥2 (Gleason ≥3+4). We trained a U-Net based DL algorithm on T2-weighted images and tested by Dice Similarity Coefficient (DSC) in comparison with three board certified radiologists that segmented the prostate slice-by-slice blinded to each other. Twenty repetitions were performed.
Results or Findings: We included 279 patients, aged 65.5±8.0 years. The developed DL algorithm achieved a reliability (DSC) of 0.86. Repeatability was 100%. The computed PSAd ranged from 0.02−2.36ng/ml/cm3. A PSAd threshold of 0.10ng/ml/cm3 showed the best balanced sensitivity/ specificity of 0.66/ 0.64, respectively, on an external dataset of 86 patients. However, when combined with patient age, a PSAd threshold of 0.11ng/ml/cm3 and an age threshold of 67 improved sensitivity up to 0.84, without affecting the specificity.
Conclusion: Our results showed how PSAd threshold can be obtained by an automatic DL algorithm applied on T2-weighted images, considering the slice-by-slice prostate volume (i.e. not based on geometric approximations, such as ellipsoid diameters), and specifically optimised in combination with patient age. The inclusion of radiomics features from T2-weighted and DWI could allow a further specific optimisation of PSAd threshold.
Limitations: These patient cohorts were collected exclusively by Italian centres.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Ethics Committees of relevant centres.
7 min
Impact of AI on the diagnosis of prostate cancer with mprMRI for novice radiologists: results of a single-centre study
Tommaso Russo, Milan / Italy
Author Block: T. Russo, G. Brembilla, E. Camisassa, M. Cosenza, R. Pennella, L. Quarta, G. Gandaglia, A. Briganti, F. De Cobelli; Milan/IT
Purpose: The purpose of this study was to assess the impact of a commercial AI software (Quantib Prostate) on the diagnostic accuracy and interreader agreement of novice radiologists in the interpretation of multiparametric MRI of the prostate (mpMRI).
Methods or Background: Accurate interpretation of mpMRI of the prostate requires training on large case series and is affected by interreader variability. AI software have been developed to overcome these limitations and assist radiologists in evaluating mpMRI. This is a single-centre retrospective study on 110 patients who underwent mpMRI for clinical suspicion of PCa (+/- targeted biopsy) at a single center. All mpMRIs were reviewed by three novice readers (radiology residents; Reader 1, 2, 3 – R1, R2, R3) with four years (R1 and R2) and one year (R3) experience in body imaging. All MRI exams were interpreted and reported in a sequential fashion: first, radiologists interpreted the exam without AI assistance; then, they were unblinded to AI results and re-reported the MRI exam. Histopathological results from MRI-targeted and concomitant systematic biopsies were considered the standard of reference; clinically significant PCa (csPCa) was considered ISUP>1. The primary objective was to compare the diagnostic accuracy of the readers without and with AI assistance.
Results or Findings: 61% (67/110) of patients had any PCa (ISUP≥1), and 43% (48/110) had csPCa (ISUP≥2). The diagnostic performance of R1 and R2 remained similar with and without Quantib Prostate. R3’s sensitivity and overall accuracy for csPCa improved from 81% and 55% to 91% and 60%, respectively. Percentage of interreader agreement was 74% (IC 0.684 to 0.807) without Quantib and 73% (IC 0.678 - 0.801) with Quantib.
Conclusion: AI-based software (Quantib Prostate) may improve the diagnostic accuracy of novice radiologists for identifying csPCa.
Limitations: This was a retrospective single-centre study.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by an ethics committee.
7 min
Multireader evaluation of a deep learning computer-aided system for prostate MRI in men with prostate cancer being considered for active surveillance
Laura Isabel Loebelenz, Bern / Switzerland
Author Block: L. I. Loebelenz1, A. Samani2, A. Azam2, D. Manea3, D. Prezzi2, A. Sharkey2, O. Williams2, S. J. Withey2, V. Goh2; 1Bern/CH, 2London/UK, 3Iasi/RO
Purpose: The Prostate Imaging-Reporting and Data System (PI-RADS) has standardised practice but variation in radiologist reporting performance remains an issue. Artificial intelligence (AI) may improve this. We aimed to evaluate the impact of a commercial deep learning (DL) software across readers of different experience in men with low-intermediate risk prostate cancer being considered for active surveillance.
Methods or Background: This retrospective study included men with low-intermediate risk prostate cancer. Five readers with varying levels of experience (<1 year to ≥5 years of experience), trained in three different countries, evaluated the initial bi-parametric prostate MRI, with and without DL-assistance in a randomised design. PI-RADSv2.1 scores were recorded and compared between reads and between readers, and against ground truth using metrics including area under the receiver operating characteristics curve (AUC). Fleiss-Kappa analysis was performed for interreader agreement. Radiological ground-truth was independent expert scoring/annotation of focal lesions. Histological ground-truth was the International Society of Urological Pathologists grade group (GG) score.
Results or Findings: 100 men were included with mean age 61±7 years and mean PSA density 0.15±0.09 (SD). There were 23 ISUP GG ≥2 cancers on histology. At an individual-reader level, for PI-RADS scoring compared to the radiological ground truth, AUC ranged from 0.65 to 0.80. DL-assistance increased AUC, although the magnitude of benefit varied across the reader pool, AUC ranging from 0.69 to 0.82. Additionally, DL-assistance appeared to reduce interreader variability. Reader agreement (weighted kappa) ranged from 0.24 to 0.56 without DL-assistance, compared to 0.45-0.55 with DL-assistance.
Conclusion: AI can improve performance for PI-RADS scoring particularly for non-expert readers in this cohort, and may also reduce variability in reader performance.
Limitations: This was a single centre, retrospective study.
Funding for this study: This study received no direct funding.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved with the code: 18/NW/0297.
7 min
Assessing radiomic stability: impact of annotation variability on radiomics features consistency in different anatomical regions
Carmen Prieto-de-la-Lastra, Majadahonda / Spain
Author Block: C. Prieto-de-la-Lastra1, A. Jimenez-Pastor2, A. Picó Peris2, D. Veiga Canuto2, L. Marti-Bonmati2; 1Madrid/ES, 2Valencia/ES
Purpose: Radiomic features are calculated from delineated regions of interest (ROIs), characterising the patient and anatomical region where they have been calculated. Therefore, the segmentation quality can substantially impact the power of radiomics. In this study, the discrepancies among different segmentations of the same ROIs are compared to analyse radiomics stability in different anatomical regions.
Methods or Background: Two datasets were inspected, each with different annotations: 100 MRI studies with the prostate gland (central and peripheral) and seminal vesicles segmented; and 960 MRI scans with annotated neuroblastic lesions. The original segmentations were modified with dilations and erosions of structuring element (SE) equal to 1, 2 or 3 voxels, simulating the annotations from different radiologists. Therefore, 7 segmentations were generated for each case. 1015 radiomic features were calculated from each mask. The distributions of characteristics across annotations were compared through the Wilcoxon-test, paired and non-paired. The correlation among the different simulated annotators was analysed with the intraclass correlation index (ICC). Finally, the most stable variables maintained across all the experiments were inspected.
Results or Findings: In the central prostate gland, the number of stable variables from Wilcoxon analysis was 165, 8 from Wilcoxon-paired evaluation and 743 from ICC tests. According to the peripheral prostate gland, the results were 40, 0, and 406. In the seminal vesicles analysis, the results were 107, 3, and 514. Finally, the neuroblastoma dataset resulted in 34, 0, and 637 stable variables from each of the experiments, respectively. Furthermore, in both datasets, the number of stable features decreased as the size of the SE increased.
Conclusion: Radiomics is less stable when annotations highly differ from the original ROIs, being more susceptible to sharper and irregular shapes as the peripheral gland and cancer lesions.
Limitations: No limitations were identified.
Funding for this study: Funding was received from PRIMAGE (PRedictive In-silico Multiscale Analytics to support cancer personalised diagnosis and prognosis, empowered by imaging biomarkers), a Horizon 2020|RIA project (Topic SC1-DTH-07-2018), grant agreement no: 826494.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Thi study was approved by the Institutional Review Board and written informed consent was waived from all participants.
7 min
Deep learning-based risk estimation for personalised follow-up in low-risk prostate cancer surveillance
Christian Roest, Groningen / Netherlands
Author Block: C. Roest1, T. Kwee1, P. Van Leeuwen2, H. Huisman3, D. Yakar1; 1Groningen/NL, 2Amsterdam/NL, 3Nijmegen/NL
Purpose: Timely follow-up in active surveillance of low-risk prostate cancer (PCa) is crucial for early detection of disease progression and to minimise overuse of diagnostics. MRI-based deep learning (DL) may optimise follow-up timing by estimating progression risk.
Methods or Background: This multi-centre study included 1607 MRI scans of 1143 men undergoing MRI for suspicion of harbouring clinically significant (cs, defined as International Society of Urological Pathology>1) PCa, who were negative for csPCa at the time of the MRI scan. A novel DL model was developed, which used MRI and routine clinical parameters to predict the risk of PCa progression (defined as csPCa at follow-up). The model was internally cross-validated in 829 exams, and externally validated in 778 exams. Cox-regression assessed whether the model predicted risk of progression. Time-dependent receiver-operating characteristic curve analysis was used to compare our proposed model to established risk estimation tools (European Randomised study of Screening for Prostate Cancer [ERSPC], Prostate Cancer Prevention Trial risk calculators [PCPT]) and PI-RADS. The area-under-the-curve was calculated five years after MRI. Optimized follow-up intervals were derived from Kaplan-Meier curves.
Results or Findings: DL scores predicted progression (internal: hazard-ratio [HR] 14.01, CI 6.61-30.65; p<0.001; external: HR 16.21, CI 3.48-75.5; p<0.001). DL achieved the highest area-under-the-curve in internal (0.75, CI 0.66-0.85) and external cohorts (0.7, CI 0.64-0.76). Internally, DL outperformed ERSPC (p=0.002) and PI-RADS (p=0.006). Externally, DL outperformed ERSPC (p=0.02) and PCPT scores (p<0.001). On internal validation, DL identified a 20% stratum of very-low-risk PCa with <10% risk of missed progression after 3.5 year follow-up, and <15% risk on external validation.
Conclusion: Our proposed DL model provided more accurate risk estimations compared to established methods. DL risk scores may help to personalise follow-up protocols for low-risk PCa.
Limitations: No limitations were identified.
Funding for this study: Funding was provided by a grant from Siemens-Healthineers.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This was a retrospective study.
7 min
Development and external validation of PSMA-PET/MR based radiomics models to predict Gleason score in prostate cancer
Tianshuo Yang, Huai'an / China
Author Block: T. Yang1, W. Tao1, Y. Song1, J. Zhang2, X. Niu1, G. Fu1, G. Bai1; 1Huai'an/CN, 2Taizhou/CN
Purpose: This research aims to establish radiomics models based on PSMA PET/MR scans with external validation to predict GS of PCa.
Methods or Background: A total of 192 PCa patients were enrolled in this study, including 160 patients in the internal validation set (Centre A) and 32 ones in the external validation set (Centre B and Centre C). PET/MR scans were performed prior to clinical treatment, and three kinds of radiopharmaceuticals ((18F)-PSMA-1007, (68Ga)Ga-PSMA-11 and (Al18F)-PSMA-BCH) were randomly applied in the PSMA-targeted PET examinations. The patients were divided into the low-risk group (GS≤7) and the high-risk group (GS>7). 1409 high-throughput features were extracted from each ROI and selected using the LASSO algorithm. Radiomics models were constructed based on the above selected features through machine learning algorithm of LR, NB, RF, SVM, and XGBoost through 30 times 4-fold repeated cross-validation, respectively. The performance of every model was evaluated through the ROC curve. The optimal algorithm and radiomics model were chosen according to the AUC value.
Results or Findings: 12-14 radiomics features and NB algorithm were selected to radiomics modelsʼ establishment. In the external validation set, the models based on PSMA-PET, T2WI, and ADC maps exhibited stable predictive performance with AUC values of 0.762, 0.698, and 0.668 (75.0%, 65.6%, and 71.9% accuracy).
Conclusion: Our study demonstrated that PSMA-targeted PET-based radiomics model occupied better performance in the GS prediction than those based on T2WI and ADC through the external cohort validation. Radiomics model of PSMA-targeted PET could be utilised to predict PCa prognosis noninvasively and help clinicians make individualised treatment plans for patients.
Limitations: The number of patients in our study was small, and further large-scale data research from multiple centres will be conducted in the future.
Funding for this study: This study received funding from the Huai’an Science and Technology Project (grant no. HAB202017 to WT).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Ethics Committee of The Affiliated Huaian No.1 People’s Hospital of Nanjing Medical University (Date 2021.12.22 / No YX-2021-113-01).

This session will not be streamed, nor will it be available on-demand!