Long-term Evaluation of the Objective Structured Clinical Examination (OSCE) in Radiology Resident Training: A Multi-dimensional Assessment from Examiners' and Examinees' Perspectives
Author Block: N. Ding, X. Gao, H. Sun, L. Song, X. Wang, Y. Chen, D. Zhang, H. Xue, Z. Jin; Beijing/CN
Purpose: This study aimed to evaluate the effectiveness, reliability, and validity of the Objective Structured Clinical Examination (OSCE) in radiology resident training, from the perspectives of both examiners and examinees.
Methods or Background: This retrospective observational study analyzed subjective evaluations and objective examination data collected over 6 years (2018–2021, 2023, and 2024). Subjective evaluations were gathered via questionnaires from 198 examiners and 818 examinees to assess the difficulty and satisfaction with the OSCE. Objective data, including examination scores, difficulty indices, and discrimination indices, for each OSCE station were analyzed using correlation analysis and t-tests.
Results or Findings: The OSCE demonstrated stable performance over 6 years, with consistent difficulty levels and discrimination ability across all stations. The average scores for individual stations varied; however, the overall final scores remained stable. Strong correlations between the station and final scores indicate good discrimination. Examinees rated the overall difficulty higher than examiners, but the objective indices aligned with examiner assessments. Over 6 years (198 examiners, 818 examinees), OSCE scores stabilized (85.48–88.48), with improved consistency (station range narrowed to 85.51–93.9 by 2024). Difficulty (0.12–0.15) and discrimination indices remained stable (most p < 0.05). Examinees rated it harder than examiners (p < 0.001).
Conclusion: The OSCE is a reliable, valid, and effective assessment tool in radiology. Evaluating the OSCE from both subjective and objective perspectives ensured the robustness and validity of the examination.
Limitations: This study had certain limitations. First, the retrospective design inherently limits causal inferences, though our 6-year dataset provides robust observational evidence. Second, while the 3-point Likert scale (difficult/moderate/easy) was chosen to maximize response rates during time-sensitive post-exam evaluations, future studies could adopt a 5-point scale for more nuanced feedback.
Funding for this study: This study was supported by the Peking Union Medical College, Graduate Education and Teaching Reform Project in 2024 (grant number: 2024yjsjg006)
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the Institutional Review Board of the conducting institution (No. S-K2067). All participants provided written informed consent prior to inclusion in the study.