Research Presentation Session: Artificial Intelligence & Machine Learning & Imaging Informatics

RPS 505 - Optimising radiological excellence: the synergy of AI and quality in radiology

February 28, 15:00 - 16:00 CET

7 min
Assessing visual quality variability in deep learning reconstructed MRI: a focus on lesions
Quintin Yves van Lohuizen, Groningen / Netherlands
Author Block: Q. Y. van Lohuizen1, S. Fransen1, C. Roest1, T. Kwee1, F. Simonis2, D. Yakar1, H. Huisman3; 1Groningen/NL, 2Enschede/NL, 3Nijmegen/NL
Purpose: This study assessed the structural similarity index measure (SSIM), the leading metric for image quality, for its reliability in evaluating deep learning reconstructed (DLRecon) MRI scans. The study particularly focused on images reconstructed using a state-of-the-art DLRecon algorithm, targeting full transversal, prostate regions and lesion-specific fields of view (FOV) for SSIM assessment.
Methods or Background: A retrospective analysis was conducted using two datasets. The recurrent inference machine (RIM), a k-space-based DLRecon algorithm, was trained on the public prostate NYU fastMRI k-space dataset (N=312) and externally validated using prostate MRI scans of eight patients from the University Medical Center Groningen. Clinically significant prostate cancer lesions (N=17) with PI-RADS scores from three to five were delineated by expert radiologists. Image quality was assessed using SSIM on three FOVs: the full transversal FOV, the prostate, and the lesion level. Assessments were conducted at varying acceleration factors from 2x to 8x.
Results or Findings: Significant differences in SSIM values were observed across FOVs in MRI scans reconstructed by the DLRecon algorithm. Specifically, lesion FOVs had lower SSIM values (0.482± 0.057 at 8-fold acceleration) compared to the full transversal FOV (0.905± 0.018) and the prostate FOV (0.870± 0.023). All differences were statistically significant (p< 0.001, Wilcoxon tests). Furthermore, linear mixed-effects models revealed a significantly steeper rate of SSIM degradation in lesion-specific FOVs, suggesting greater variability in image quality in these critical areas.
Conclusion: DLRecon algorithms like the RIM showed significantly lower SSIM values in lesion-specific FOVs compared to the full transversal and prostate FOVs. This discrepancy calls into question the adequacy of SSIM as a standalone quality metric, emphasising the need for more targeted quality assessments in future DLRecon algorithm development.
Limitations: No limitations were identified.
Funding for this study: Funding was provided by Health~Holland and Siemens Healthineers.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: The study is retrospective.
7 min
Image quality and sharpness improvement in coronary CT angiography using a deep-learning super-resolution reconstruction algorithm: a phantom study
Amir Pourmorteza, Atlanta, GA / United States
Author Block: T. W. Holmes1, S. Sharma2, S. Ross2, K. Schultz2, P. Gleason1, J. Schuzer2, Z. Yu2, R. Thompson2, A. Pourmorteza1; 1Atlanta, GA/US, 2Vernon Hills, IL/US
Purpose: The purpose of this study was to investigate the performance of a super-resolution deep-learning-based reconstruction (DLR) algorithm named precision image quality engine (PIQE) developed for cardiac CT against two clinical reconstruction algorithms: adaptive iterative dose reduction (AIDR) and high-resolution DLR (AiCE).
Methods or Background: We 3D-printed inserts with microfluidic channels (d= 0.25 - 3.5 mm) with stents and calcified plaques embedded and filled with dilutions of iodinated contrast agent. The inserts were placed inside a 12-cm diameter water tank and scanned on a clinical CT scanner with prospective ECG-gating: 120 kVp, exposure: 25, 50, 250, and 400 mAs. Images were reconstructed with matched parameters using AIDR, AiCE, and PIQE: 512 x 512 matrix, 0.312 x 0.312 x 0.5 mm3 voxel size. PIQE images were also reconstructed with a 1024 x 1024 matrix and 0.156 x 0.156 x 0.5 mm3. We evaluated CT number stability, contrast-to-noise ratio (CNR), and image sharpness as a function of radiation dose.
Results or Findings: CT number deviations from the 400 mAs baseline were measured in iodine, water, and fat inserts and were in the [-1.1 3.1], [-1.1, 3.4], and [-2.2 0.26] for AIDR, AiCE, and PIQE, respectively. CNRs between iodine, water, fat (soft plaque), and calcium (hard plaque), were between 36% to 97% higher for PIQE compared to AIDR, with maximum CNR improvement observed in the lowest dose (25 mAs) scans. AiCE images showed a 0% - 37% increase in CNR in low-dose scans (25,50 mAs), however, their CNR was between 11% to 27% lower for the higher-dose scans (400,250 mAs), compared to AIDR. MTF cutoff at 10% was 8.98, 10.68, 10.44, and 13.61 lp/cm for AIDR, AiCE, PIQE, and PIQE1024 respectively.
Conclusion: Overall, DLR algorithms improved CNR and image sharpness between 16%-18% at normal resolution voxel size. Furthermore, PIQE improved image sharpness by 51% when reconstructed at high-resolution voxel size.
Limitations: More experiments mimicking different patient sizes are warranted.
Funding for this study: This study was sponsored research agreement with Canon Medical Research USA.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No information provided by the submitter.
7 min
Elevating TIPSS procedures: AI denoising's impact on cone-beam CT image quality and diagnostic confidence
Reza Dehdab, Tuebingen / Germany
Author Block: R. Dehdab, A. S. Brendlin, S. Afat, G. Grözinger, K. Nikolaou, J. Mück; Tuebingen/DE
Purpose: This study aimed to assess the utility of artificial intelligence (AI)-based denoising techniques in enhancing cone-beam computed tomography (CBCT) images during transjugular intrahepatic portosystemic shunt (TIPSS) procedures, taking into account variations in body mass index (BMI).
Methods or Background: A retrospective review of 60 patients undergoing TIPSS between 2016 and 2022 was approved. Patients were categorised and paired based on BMI and divided according to image acquisition durations of three seconds (3s) and six seconds (6s). CBCT images were processed with AI denoising (ClariCT.AI, ClariPI®). Image quality was quantitatively and qualitatively evaluated, considering image noise, radiation dose, and Contrast-to-noise ratio (CNR) and diagnostic confidence. Relationships between BMI, noise, CNR and radiation doses were also analysed.
Results or Findings: AI-based deep learning denoising (DLR) reduced image noise in 3s and 6s acquisition groups. Despite higher initial noise in the 3s group, Post-DLR, noise equalised (3s-Original vs. DLR: T-Statistic: 57.06, p<0.001; 6s-Original vs. DLR: T-Statistic: 41.11, p<0.001). Positive BMI-noise correlation occurred in 3s (r=0.564, P-value=0.006), not in 6s (r=0.3738, P=0.09). Both groups showed a strong BMI-radiation dose correlation (3s- Correlation Coefficient (r): 0.681, p<0.001; 6s- Correlation Coefficient (r): 0.681, p<0.001). Diagnostic confidence was higher in 6s DLR (Kruskal-Wallis H Test: Statistic: 152.63, p<0.001), with 3s DLR achieving comparable confidence to 6s originals (Mann-Whitney U Test: Statistic: 3943.5, p=0.83).
Conclusion: In conclusion, our study demonstrates that AI-enhanced denoising techniques in CBCT images during TIPSS procedures not only significantly improve image quality but also reduce radiation exposure, highlighting the potential of AI in medical imaging for optimising diagnostic accuracy while prioritising patient safety.
Limitations: The study's relatively small sample size and single-centre setting may limit the generalisability of the findings.
Funding for this study: No funding was received for this research.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was retrospectively approved by the Institutional Review Board with a waiver for the need for informed consent (#167/2022BO2).
7 min
3D human reconstruction algorithm: a novel technique for prospective image quality control in chest radiography
Yuqi Tan, Chengdu / China
Author Block: Y. Tan, P. Wu, Z. Ye, Y. Hou, C. Xia, Z. Li; Chengdu/CN
Purpose: The purpose of this study was to propose a novel real-time 3D human reconstruction algorithm and investigate its performance of identifying incorrect body postures and radiographer’s operations in chest radiography.
Methods or Background: A total of 83 chest post-anterior (PA) images and 71 chest lateral (LA) images shot by different radiographers were included for this study. The 3D human reconstruction algorithm took a photo as input and output a 3D mesh containing body morphology information, which was mainly based on a series of deep neural networks, including SMPL-X and HybrIK-X. A camera fixed in front of the beam limiter was used to capture photos at the time of exposure. Automatic measurement tools were developed for 3D human evaluation. Indexes including shrug (PA), scapula position (PA), arms up (LA), postures, exposure field, and projection point were assessed in both subjective and 3D human evaluation. Subjective results were regarded as reference standard. Sensitivity, specificity, and Kappa consistency of each index were calculated.
Results or Findings: In the chest PA of 3D human evaluation, the accuracy of identifying exposure field was 100%. The sensitivity, specificity, and Kappa value of shrug, scapula position, postures, and projection point were 0.82, 0.92, 0.73; 0.86, 0.83, 0.69; 1, 0.93, 0.73; 0.54, 0.86, 0.42. In the chest LA of 3D human evaluation, the accuracy of identifying exposure field was also 100%. The sensitivity, specificity and Kappa value of arms up, postures and projection point were 0.83, 0.94, 0.63; 0.95, 0.80, 0.65; 0.57, 1, 0.68.
Conclusion: 3D human reconstruction algorithm showed good ability in identifying incorrect body postures and radiographer’s operations in chest radiography. Further improvement of this algorithm is needed to enhance its accuracy.
Limitations: The sample size was small and the measurements of projection point in 3D human evaluation needed modification.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This study was retrospective, so did not require an ethics review.
7 min
MRI-radiomics for MGMT promoter methylation prediction in glioma: methodological quality, systematic review, and meta-analysis
Fabio Martino Doniselli, Milan / Italy
Author Block: F. M. Doniselli, R. Pascuzzo, M. Moscatelli, M. Grisoli, L. M. Sconfienza; Milan/IT
Purpose: This study aimed to evaluate the methodological quality and diagnostic accuracy of MRI-based radiomic studies predicting O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status in gliomas.
Methods or Background: PubMed Medline, EMBASE, and Web of Science were searched to identify MRI-based radiomic studies on MGMT methylation in gliomas published until December 31, 2022. Three raters evaluated the study methodological quality with Radiomics Quality Score (RQS, 16 components) and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD, 22 items) scales. Risk of bias and applicability concerns were assessed with QUADAS-2 tool. A meta-analysis was performed to estimate the pooled area under the curve (AUC) and to assess inter-study heterogeneity.
Results or Findings: We included 26 studies. The median RQS total score was 8 out of 36 (22%, range: 8%-44%). Thirteen studies performed external validation. All studies reported AUC or accuracy, but only 4 (15%) performed calibration and decision curve analysis. No studies performed phantom analysis, cost-effectiveness analysis, and prospective validation. The overall TRIPOD adherence score was between 50% and 70% in 16 studies and below 50% in 10 studies. The pooled AUC was 0.78 (95% CI: 0.73-0.83, I2=93.4%) with a high inter-study heterogeneity. Studies with external validation and including only WHO-grade IV gliomas had significantly lower AUC values (0.65, 95% CI: 0.57-0.73, p<0.01).
Conclusion: Study RQS and adherence to TRIPOD guidelines was generally low. Radiomic prediction of MGMT methylation status showed great heterogeneity of results and lower performances in grade IV gliomas, which hinders its current implementation in clinical practice.
Limitations: Some included studies did not report AUC values; grey literature was not included.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: This study was a meta-analysis.
7 min
Automatic uncertainty-based quality controlled segmentation of multimodal cardiac MR images
Stéphanie Bricq, Dijon / France
Author Block: J. Michaud, T. W. Arega, S. Bricq; Dijon/FR
Purpose: Although deep learning-based segmentation methods have shown promise in automating the segmentation of cardiac MRI images, they are not widely used in clinical practice due to the lack of robustness and reliability of the models. We propose an uncertainty-based quality control (QC) framework to identify failed segmentations and to enhance the reliability of multimodal cardiac MRI segmentation models.
Methods or Background: Automatic and accurate analysis of myocardial tissue characterisation is highly dependent on the quality of the segmentation result. We proposed here an automatic quality controlled T1 mapping and LGE segmentation. The cardiac structures were segmented from LGE, native and post-contrast T1 mapping images using a Bayesian Swin Transformer-based U-Net. The quality of the segmentation output is assessed using uncertainty-based QC metrics. These uncertainty features are used as inputs to a random forest-based classifier to evaluate the segmentation quality and reject bad segmentations. The proposed framework was tested on a private cardiac MR dataset with various diseases.
Results or Findings: The proposed uncertainty-based quality control framework is robust in detecting inaccurate segmentations. Proposed QC method achieves an area under the ROC curve (AUC) of 0.922 for native T1 images, 0.886 for post-contrast T1 images, and 0.918 for LGE images on binary classification (bad or good segmentation).
Conclusion: The proposed framework automatically segments cardiac structures on multimodal MR images and rejects inaccurate segmentation results. It can be applied to other segmentation methods to detect segmentation failures and to enhance the reliability of the segmentation models.
Limitations: In addition to identify segmentation failures, it could be interesting to understand the underlying causes or sources behind them. By gaining insights into why certain segmentation results are rejected, the model’s reliability could be enhanced.
Funding for this study: Funding was received from the French National Research Agency (ANR) with reference number: ANR-19-CE45-0001-01-ACCECIT.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No information provided by the submitter.
7 min
The performance of a commercial artificial intelligence algorithm in an external quality assurance scheme regularly used by humans in the NHS breast screening programme
Yan Chen, Nottingham / United Kingdom
Author Block: Y. Chen, A. Taib, I. Darker, J. James; Nottingham/UK
Purpose: The purpose of this study was to compare the performance of human readers in the National Health Service Breast Screening Programme and a commercially available artificial intelligence (AI) algorithm when interpreting Personal Performance in Mammographic Screening (PERFORMS) external quality assurance test sets.
Methods or Background: Two consecutive PERFORMS sets, each consisting of 60 challenging normal, benign, and biopsy proven malignant mammography cases, were assessed by humans between 2018 and 2021 and AI in 2022. Suspicion of malignancy scores were assigned to features detected. Each breast was considered separately, and the highest score was used to assess performance using a pre-defined recall threshold. Sensitivity, specificity and ROC analysis was used to compare the performance of AI and human readers retrospectively.
Results or Findings: 552 human readers interpreted both PERFORMS sets. There were 161 normal, 70 malignant, and 9 benign breasts. There was no difference in the area under the receiver operating curve for AI and human readers (0.93 and 0.88 respectively, p=0.15). At the developer suggested recall threshold, there was no difference in AI and human reader sensitivity (84% and 90% respectively, p=0.34), but the specificity of the AI was higher than the human readers (89% and 76% respectively, p=0.003). Using recall thresholds to match mean human reader performance (90% sensitivity and 76% specificity), the AI showed no difference in performance with a sensitivity of 91% (p=0.73) and specificity of 77% (p=0.85).
Conclusion: Diagnostic performance of AI was comparable to that of the average human reader when evaluating cases from two enriched test sets of the PERFORMS scheme.
Limitations: The PERFORMS sets consist of a small number of mammograms. These are not representative of typical screening populations as they are enriched with challenging malignant cases.
Funding for this study: Funding for this study was received from Lunit Inc.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: Following discussion with the local research and development team, the requirement for ethical approval was waived due to the retrospective nature of the study.

This session will not be streamed, nor will it be available on-demand!