A Fine-Tuned Large Language Model Chatbot for Multi-Scenario Radiology Cancer Care: Randomized Controlled Trial on Interaction Optimization, Emotional Support, and Provider Burnout Reduction
Author Block: L. Zhang; Chongqing/CN
Purpose: To develop and validate a scenario-specific fine-tuned LLM chatbot for optimizing clinical interactions between cancer patients and radiology healthcare providers (RHPs).
Methods or Background: A RCT across three hospitals collected 36,511 minutes of dialogue from 12 sites in three scenarios—Appointment Triage (AT), Pre-examination Preparation (PP), and Radiology Clinic Services (RCS)—transcribed and curated into 27,120 validated dialogues. REC was developed by fine-tuned DeepSeek R1 using 80% of dialogues and scenario-specific prompts. Two sub-trials evaluated REC: Sub-trial 1 included 1,424 patients in AT/PP; Sub-trial 2 included 638 in RCS. Both randomized patients 1:1 to RHP+REC or RHP. A total of 150 RHPs were similarly randomized. Primary outcomes were patient-rated dialogue quality (empathy, frustration, emotional regulation, factuality, integrity, and satisfaction); secondary outcomes included burnout and image quality.
Results or Findings: 1. Dialogue Quality:
AT/PP: RHP+REC significantly improved factuality (AT: 4.12 vs. 3.39; PP: 4.52 vs. 3.79; both P < 0.001), integrity, satisfaction, and reduced frustration (PP: 3.24 vs. 3.95, P = 0.002).
RCS: RHP+REC excelled in factuality (4.58 vs. 3.69, P < 0.001) and satisfaction (4.03 vs. 3.52, P = 0.003) but underperformed in empathy (3.88 vs. 4.42, P = 0.002).
2.Burnout:
RHP+REC reduced exhaustion (1.85 vs. 2.40, P < 0.01) and depersonalization (2.18 vs. 3.96, P = 0.003).
3.Image Quality:
REC improved CT (4.35 vs. 4.00, P < 0.01) and MRI (4.12 vs. 3.79, P = 0.02) quality.
Conclusion: REC optimized radiology workflows and reduced burnout.
Limitations: First, it requires provider validation, limiting scalability. Future versions should enhance autonomous validation while ensuring safety. Second, effectiveness relies on training data quality; continuous updates and broader datasets are needed for generalizability. Third, future work should improve clinical adaptability and multimodal integration (e.g., imaging, physiological data), with real-time feedback for continuous learning .
Funding for this study: None
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The research protocol has been approved by the ethics review committees of all participating hospitals (H1, H2, H3), and the entire research process strictly follows the ethical principles of the Helsinki Declaration.