A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer
Author Block: Y. Zhang1, S. A. Mali1, H. C. Woodruff1, S. Amirrajab1, E. I. Crespo2, A. Jimenez-Pastor2, L. Marti-Bonmati2, Z. Salahuddin1, P. Lambin1; 1Maastricht/NL, 2Valencia/ES
Purpose: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is crucial for risk-stratified rectal cancer treatment. However, subjective visual assessment and inter-institutional variability limit diagnostic consistency. Therefore, this study aims to develop and externally evaluate a multi-center, foundation-model-driven framework that automatically classifies EVI and MFI on axial and sagittal T2-weighted MRI.
Methods or Background: 331 pre-treatment rectal-cancer MRI scans from three European hospitals (La Fe University and Polytechnic Hospital, Unidade Local de Saúde Hospital, and Centre Hospitalier Universitaire d’Angers) were retrospectively analyzed. A self-supervised frequency-domain harmonization pipeline was used to reduce scanner variability. Three classifiers—SeResNet, the universal biomedical pretrained transformer (UMedPT) with a multilayer perceptron (MLP) head, and a logistic-regression variant using frozen UMedPT features (UMedPT_LR)—were trained (n=265) and tested (n=66). Gradient-weighted class activation mapping (Grad-CAM) visualized model predictions.
Results or Findings: UMedPT_LR achieved superior EVI classification using fused axial and sagittal features (area under the receiver operating characteristic curve, AUC = 0.82). Optimal MFI detection occurred with UMedPT using axial harmonized images (AUC = 0.77); these results outperform the challenge winners. Frequency-domain harmonization enhanced MFI performance, with variable effects on EVI. Multi-view fusion, which combined axial and sagittal features, consistently improved EVI classification. Conventional convolutional neural networks (CNNs) underperformed, especially in F1 score and balanced accuracy. Grad-CAM demonstrated appropriate model attention on peritumoral regions (EVI) and mesorectal fascia margins (MFI).
Conclusion: The proposed foundation-model-driven framework leveraging frequency-domain harmonization and multi-view feature fusion achieves state-of-the-art performance in automated MRI classification of EVI and MFI, demonstrating excellent generalizability across multiple centers.
Limitations: Limitations include modest sample size, no center-specific analyses, and limited validation. Larger multi-institutional cohorts, advanced imaging, and in silico trials are needed to improve generalizability and clinical translation.
Funding for this study: Authors acknowledge financial support from ERC advanced grant (ERC-ADG-2015 n° 694812 - Hypoximmuno),, ERC-2020-PoC: 957565-AUTO.DISTINCT. Authors also acknowledge financial support from the European Union’s Horizon research and innovation programme under grant agreement: CHAIMELEON n° 952172 (main contributor), ImmunoSABR n° 733008, EuCanImage n° 952103, TRANSCAN Joint Transnational Call 2016 (JTC2016 CLEARLY n° UM 2017-8295), IMI-OPTIMA n° 101034347, AIDAVA (HORIZON-HLTH-2021-TOOL-06) n°101057062, REALM (HORIZON-HLTH-2022-TOOL-11) n° 101095435, RADIOVAL (HORIZON-HLTH-2021-DISEASE-04-04) n°101057699 and EUCAIM (DIGITAL-2022-CLOUD-AI-02) n°101100633. This study was also supported by the China Scholarship Council grant (202208110055).
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The study was approved by the Ethics Committees of all participating centers: La Fe University and Polytechnic Hospital (Valencia, Spain), Unidade Local de Saúde Hospital (Portugal), and Centre Hospitalier Universitaire d’Angers (France).