Research Presentation Session: Artificial Intelligence & Machine Learning & Imaging Informatics

RPS 1605 - Exploring the frontiers of AI-enhanced radiology reporting

March 1, 16:00 - 17:30 CET

7 min

Utilising Chat-GPT4 for conversion of free-text head and neck cancer CT reports into structured reports

Amit Gupta, Ansari Nagar / India

Author Block: A. Gupta, K. Rangarajan, A. Garg; New Delhi/IN
Purpose: The purpose of this study was to assess the performance of generative pre-trained transformer 4 (GPT-4) for conversion of free-text computed tomography (CT) reports of head and neck cancer (HNCa) patients into structured reports using a predefined template.
Methods or Background: We retrieved 50 CT reports of HNCa patients from our department. A structured CT report template for HNCa was prepared enumerating various anatomical sites and their respective subsites. Other key imaging findings were also included - status of cervical lymph nodes, airway compromise and involvement of other neck structures and vessels. In the chat portal of GPT-4, the prompt with best results for structured report generation was selected after prompt engineering. Generated structured reports were evaluated by a radiologist by recording the number of places featuring missing information, misinterpreted information and any additional information not present in the actual report. The reporting template was then modified to explicitly incorporate the areas of mistakes and new GPT-4 responses were recorded.
Results or Findings: GPT-4 successfully converted all 50 free-text reports into structured reports. There were ten places with missing information: tracheostomy tube (n=3), non-inclusion of sternocleidomastoid in strap muscles (n=2), extranodal tumour extension (n=3) and contiguous involvement of neck structures by nodal mass rather than the primary tumour (n=2). Four pieces of information were misinterpreted: abbreviations (n=2) and non-suspicious lung nodules regarded as distant metastases (n=2). GPT-4 did not indicate any additional findings. Upon the appropriate incorporation of missing areas in the reporting template and repeating the prompts, GPT-4 rectified all the reports with no repeated or additional mistakes.
Conclusion: The GPT-4 model can be used to structure free-text radiology reports using plain language prompts and a simple yet comprehensive reporting template.
Limitations: Fine-tuning using the GPT-4 application programming interface (API) was not done in our study.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: The Institutional Ethics Committee approved this study.

7 min

Enhancing radiology reporting efficiency through structured reports: a quantitative analysis

Paweł Paczuski, Legionowo / Poland

Author Block: P. Bombinski¹, P. P. Paczuski¹, K. Paczuski¹, B. Duranc¹, A. Kusak²; ¹Warsaw/PL, ²Lodz/PL
Purpose: This study explores the impact of structured reporting on radiologists' efficiency, standardisation, and clinician comprehension. We propose and analyse key metrics to quantify the acceleration of report creation using predefined templates and trigger mechanisms.
Methods or Background: Structured reports apply checklist-driven templates for standardised radiological reporting. These templates comprise a checklist of observations and predefined triggers, ensuring systematic reporting. Radiologists can click on checklist items, or trigger larger report segments, such as "norm" for healthy examinations, thereby reducing the need for free text input. Structured reports can be generated using a keyboard, mouse, or voice dictation and commands.
Results or Findings: Our results were based on 10,000 reports of various radiological examinations performed by 20 radiologists. Our proposed metrics for evaluating the efficacy of structured reporting include: number of keystrokes (each use of computer keyboard), number of checklist clicks (each interaction with the checklist), checklist accepted suggestions (number of checklist suggestions included in the final document), contextual accepted suggestions (number of contextual suggestions included in the final document), keystrokes saved, time saved, and total time spent producing the document. Our findings demonstrate that structured reporting significantly reduces keystrokes and accelerates report generation, with an average time saving of 30% compared to conventional keyboard use. Furthermore, 84% of the checklist suggestions were accepted, improving report standardisation and reducing errors.
Conclusion: Structured reporting offers a promising approach to enhance radiologists' reporting efficiency. By utilising predefined templates and triggers, radiologists can create reports more rapidly while ensuring a higher level of standardisation. Clinicians benefit from clearer, more consistent reports, which can lead to better patient care. This study underscores the potential for structured reporting to bring significant advancements in radiology practices, establishing a new benchmark for efficiency and standardisation.
Limitations: No limitations were identified.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No information provided by the submitter.

7 min

Leveraging GPT-4 for structured radiology reporting: a multilingual proof-of-concept study

Felix Busch, Munich / Germany

Author Block:

Purpose:

Methods or Background:

Results or Findings:

Conclusion:

Limitations:

10 per report.

Funding for this study:

Has your study been approved by an ethics committee?

Ethics committee - additional information:

7 min

Automated anonymisation of radiology reports: comparison of publicly available natural language processing and large language models for HIPAA-compliant data use

Marcel Christian Langenbach, Boston / United States

Author Block:

Purpose:

Methods or Background:

Results or Findings:

00 for MRN, ACC, and dates, followed by NLPsp, which had lower precision (0.86) and F1-score (0.92) for dates with non-dates classified as dates in 54 instances (28 cases). The LLM-model had perfect precision for all PHIs but the lowest recall of 0.96 for ACC (missed 4 instances in 4 cases) and 0.52 for dates (missed 134/333 instances in 69 cases) (F1 scores 0.98 and 0.68, respectively). Importantly, NLPac and NLPsp did not remove relevant medical information, while the LLM-model removed relevant information in 10% (n=10).

Conclusion:

Limitations:

Funding for this study:

Has your study been approved by an ethics committee?

Ethics committee - additional information:

7 min

A RAdiology Data EXtraction (RADEX) tool for fast and accurate information curation from free-text reports: case study on thyroid ultrasound examinations

Lewis James Howell, Leeds / United Kingdom

Author Block: L. J. Howell, A. Zarei, T. M. Wah, S. Karthik, H. H. L. Ng, J. McLaughlan; Leeds/UK
Purpose: Extracting information from 'free-text' radiology reports is important for service evaluation, audit, unbiased cohort selection, case retrieval, and translational research including labelling medical datasets for artificial intelligence analysis. While machine learning methods have potential for automating this task, reliance on large labelled datasets and specific computing requirements limits their usefulness. Methods using human-defined rules offer a practical alternative, enabling better utilisation of information-rich radiology reports.
Methods or Background: Our tool, RAdiology Data EXtraction (RADEX), leverages clinicians' domain expertise for information extraction. It uses regular expressions (regex) for efficient and flexible text pattern-matching, including wildcard and proximity searches, Boolean logic, and negation handling. This rule-based approach enables clinical users to define complex queries without specialised software knowledge, giving an easy-to-understand method which allows predictions to be reviewed and rules updated in response to changing requirements and terminology. This transparency is vital for building trust and ensuring regulatory compliance.
Results or Findings: RADEX was applied to neck and thyroid ultrasound reports performed between 2015-2019 across five different hospitals. Nineteen sonographic observations were classified, including presence and multiplicity of thyroid nodules, British Thyroid Association thyroid nodule grading(s), altered thyroid echotexture, thyroiditis, thyroidectomy, nodal abnormality, and parathyroid adenomata. On an expert-labelled dataset of 400 reports, RADEX achieved >90% accuracy in all classes. Processing >10,000 reports took less than 60 seconds on a standard laptop.
Conclusion: This free open-source tool provides a scalable approach to extracting structured data from free-text reports, prioritising usability and explainability. It leverages regex's powerful pattern-matching without requiring knowledge of its complex syntax, suiting research and audit tasks where free-text information is key to understanding, but manual review is time-consuming and expensive.
Limitations: The main limitation of the study is that generalisability to other datasets/languages was not evaluated.
Funding for this study: Funding was provided by the UK Research and Innovation (UKRI) Engineering and Physical Sciences Research Council (EPSRC).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: REC review is not required

7 min

Large language models for structured reporting with speech recognition: a comparative feasibility study

Benedikt Kämpgen, Würzburg / Germany

Author Block:

Purpose:

1186/s13244-023-01392-y). However, the effort of training this NLP-based system for additional SR-templates is high, e.g., modelling of concepts, synonyms, and implicit knowledge.

Methods or Background:

Results or Findings:

18 / -0.29 / -0.21) and 50 real examples (-0.09 / -0.19 / +0.03) of urolithiasis CT reports, with LLM-based fictional (0.80 / 0.70 / 0.75) and real (0.81 / 0.77 / 0.86) versus original fictional (0.98 / 0.99 / 0.96) and real (0.90 / 0.96 / 0.83).

Conclusion:

Limitations:

Funding for this study:

Has your study been approved by an ethics committee?

Ethics committee - additional information:

7 min

Using large language models to improve quality and actionability of radiology reports

Kalyan Sivasailam, Bangalore / India

Author Block: N. Kumarasami, P. N., K. Sivasailam, B. Subramanian; Bangalore/IN
Purpose: The objective of this study was to provide radiologists with fine-tuned large language models (LLMs) to enhance the quality, clarity, and actionability of radiology reports, with specific focus on CT Abdomen reports. Current radiology reporting methods can lead to ambiguities or misdiagnoses, especially in a remote diagnostics/teleradiology set-up. The physician/surgeon is looking for a detailed qualitative and quantitative description of a finding based on his/her suspicions and the patient's symptoms in order to arrive at a narrower set of differential diagnoses, as well as the appropriate procedure(s) he/she may follow in case of surgical intervention. Our focus was on understanding the mechanics and technical architecture behind the integration of LLMs into radiology workflows to transform the findings of a pathology into a very detailed and actionable description that is useful and relevant for the referring physician/surgeon.
Methods or Background: The authors fine-tuned a foundational model and built a radiology-specific large language Model, focused on CT Abdomen, using real-life reports and templates. Initially, the LLM was fine-tuned with a data set comprising 4,500 question-answer pairs curated by the authors using instruction fine-tuning methodology. Subsequently, a retrieval-augmented generation method was employed, refining the models with 120,000 real-world reports. In the practical set-up, radiologist interact with a chatbot-like interface and input the pathologies. Using patient history, an initial draft report materialises using the LLM. Radiologist continue responding to the chatbot culminating in a comprehensive report encompassing differential diagnoses.
Results or Findings: The LLMs were deployed in a remote diagnostics setup at 5C Network, India. Productivity went up by 270%. Queries from referring physicians dropped by 76%.
Conclusion: Incorporating LLMs into radiology workflows significantly enhances report clarity and accuracy, offering a promising avenue for optimised patient care and streamlined diagnostic processes.
Limitations: The set-up relies on radiologists identifying the primary pathologies correctly.
Funding for this study: Funding was received from 5C Network Private Limited, India.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No information provided by the submitter.

7 min

A natural language processing pipeline to extract relevant information from mammography reports

Nikola Cihoric, Bern / Switzerland

Author Block:

Purpose:

Methods or Background:

Results or Findings:

Conclusion:

Limitations:

Funding for this study:

Has your study been approved by an ethics committee?

Ethics committee - additional information:

7 min

Automatic structuring of radiology reports with on-premise open-source large language models

Piotr Woznicki, Warszawa / Poland

Author Block:

Purpose:

Methods or Background:

05;0.05) as the region of practical equivalence (ROPE).

Results or Findings:

87 (94% HDI: 0.83; 0.90) for English and 0.67 (0.60; 0.73) for German reports. MCC differences were all overlapping ROPE for English: LLM-Human1 0.012 (-0.037; 0.061), LLM-Human2 -0.002 (-0.05; 0.05), Human1-Human2 -0.01 (-0.07; 0.04), and German reports: LLM-Human1 0.001 (-0.08; 0.08), LLM-Human2 -0.065 (-0.157; 0.027), Human1-Human2 -0.066 (-0.157; 0.026).

Conclusion:

Limitations:

Funding for this study:

Has your study been approved by an ethics committee?

Ethics committee - additional information:

7 min

Integrating AI results into standardised structured radiology reports: feasibility and implementation

Cyril Thouly, Sion / Switzerland

Author Block: C. Thouly¹, B. Dufour¹, B. Rizk², D. Goyard³, P. Petetin⁴, H. Brat¹, F. Zanca¹; ¹Sion/CH, ²Villars-sur-Glane/CH, ³Paris/FR, ⁴Berre l'Etang/FR
Purpose: One of the main challenges the industry of radiology currently faces is the integration of AI results into clinical workflow. Healthcare professionals navigate multiple systems and interfaces (PACS, RIS, AI report), with frequently inefficient workflows. We aimed at demonstrating the feasibility and effectiveness of integrating AI-derived results into standardised structured reports (SSR) for radiology, enhancing clinical workflow and reporting accuracy.
Methods or Background: A collaboration was initiated among a RIS provider, an AI platform provider, and our R&D department within a multicentric radiology network. The structured AI results were sent to the RIS via HL7 ORU messages (TCP protocol) and one message was generated per analysis. Each element of the AI structured result was placed in an OBX segment of the HL7 message. We use PatientID and AccessionNumber to link images on the PACS and radiology report in the RIS. Segments were subsequently incorporated into SSR using a beacon in the RIS, undergoing multiple iterations for layout, wording, and punctuation accuracy. The percentage of AI pre-populated fields of SSR was estimated.
Results or Findings: AI results were promptly transmitted to the RIS as HL7 messages. On accessing the report in the RIS, radiologists encountered prepopulated SSR subsections. Currently 40 bone age and 140 knee MRI SSR templates were successfully integrated into clinical workflows. For bone age as well as for knee MRI, the percent of pre-populated report was 60%.
Conclusion: Seamless integration of AI results into SSRs is achievable during routine clinical workflows. The active involvement of radiologists ensures that resultant prepopulated reports align with their requirements.
Limitations: The success of this integration hinges on AI vendors delivering structured and standardised results. Inaccurate AI results present potential liability concerns for radiologists due to the risk of transmitting unchecked erroneous reports.
Funding for this study: No funding was received for this study.
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: No information provided by the submitter.

7 min

Structured reporting for efficient epidemiological and in-hospital prevalence analyses of pulmonary embolism

Tobias Jorg, Mainz / Germany

Author Block:

Purpose:

Methods or Background:

Results or Findings:

2/1 (M/F). Outpatients showed a lower prevalence of 23% compared to patients from regular wards (27%) and intensive care unit (30%). Surgically referred patients had a higher prevalence than patients from internal medicine (34% vs 22%). Patients with central and bilateral PEs had a significantly higher occurrence of right heart strain compared to patients with peripheral and unilateral embolisms.

Conclusion:

Limitations:

Funding for this study:

Has your study been approved by an ethics committee?

Ethics committee - additional information:

7 min

Extracting information from unstructured MRI reports with a local open-source GPT model

Bastien Le Guellec, Lille / France

Author Block: B. Le Guellec, A. Lefevre, C. Bruge, L. Hacein-Bey, J-P. Pruvo, G. Kuchcinski; Lille/FR
Purpose: We set out to use a local open-source GPT model to automate information extraction tasks from unstructured MRI reports. We calculated its performance on reports from emergency brain MRIs performed for patients with headaches.
Methods or Background: All consecutive radiological reports from a French quaternary centre in 2022 were retrospectively reviewed. Two radiologists identified MRIs that were done for headaches. Four radiologists scored reports' conclusions as normal or abnormal. Abnormalities were labelled as either headache-generating or incidental. In parallel, Vicuna, an open-source GPT large language model, performed the same tasks. Vicuna's performances were evaluated using the radiologists' consensus as the gold standard.
Results or Findings: A total of 2398 reports were identified, of which 595 included headache in their indication. Median patient age was 35; 68% were female. The overall rate of causal findings in outpatients with headache was 23% (135/595). Our GPT-based method had an accuracy of >95% for simple information extraction tasks such as indication of the exam, patient sex and age, use of contrast medium injection and study categorisation as normal or abnormal. Vicuna's accuracy was 82% for the most complex task of causality inference between an abnormal MRI finding and symptoms.
Conclusion: We found that an open-source GPT model can extract information from radiological reports with excellent accuracy without further training. We hypothesise that this method could also be applied to any information extraction task relying on unstructured medical records.
Limitations: Due to the monocentric design of our study, we could not test for variability in reporting styles or language. Further studies will be needed to explore the adaptability of the proposed framework, even though it is expected to be high based on ability of generative language models to handle various languages seamlessly.
Funding for this study: No specific funding was received for this study.
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study was approved by the IRB of Lille University Hospital.