A RAdiology Data EXtraction (RADEX) tool for fast and accurate information curation from free-text reports: case study on thyroid ultrasound examinations
Author Block: L. J. Howell, A. Zarei, T. M. Wah, S. Karthik, H. H. L. Ng, J. McLaughlan; Leeds/UK
Purpose: Extracting information from 'free-text' radiology reports is important for service evaluation, audit, unbiased cohort selection, case retrieval, and translational research including labelling medical datasets for artificial intelligence analysis. While machine learning methods have potential for automating this task, reliance on large labelled datasets and specific computing requirements limits their usefulness. Methods using human-defined rules offer a practical alternative, enabling better utilisation of information-rich radiology reports.
Methods or Background: Our tool, RAdiology Data EXtraction (RADEX), leverages clinicians' domain expertise for information extraction. It uses regular expressions (regex) for efficient and flexible text pattern-matching, including wildcard and proximity searches, Boolean logic, and negation handling. This rule-based approach enables clinical users to define complex queries without specialised software knowledge, giving an easy-to-understand method which allows predictions to be reviewed and rules updated in response to changing requirements and terminology. This transparency is vital for building trust and ensuring regulatory compliance.
Results or Findings: RADEX was applied to neck and thyroid ultrasound reports performed between 2015-2019 across five different hospitals. Nineteen sonographic observations were classified, including presence and multiplicity of thyroid nodules, British Thyroid Association thyroid nodule grading(s), altered thyroid echotexture, thyroiditis, thyroidectomy, nodal abnormality, and parathyroid adenomata. On an expert-labelled dataset of 400 reports, RADEX achieved >90% accuracy in all classes. Processing >10,000 reports took less than 60 seconds on a standard laptop.
Conclusion: This free open-source tool provides a scalable approach to extracting structured data from free-text reports, prioritising usability and explainability. It leverages regex's powerful pattern-matching without requiring knowledge of its complex syntax, suiting research and audit tasks where free-text information is key to understanding, but manual review is time-consuming and expensive.
Limitations: The main limitation of the study is that generalisability to other datasets/languages was not evaluated.
Funding for this study: Funding was provided by the UK Research and Innovation (UKRI) Engineering and Physical Sciences Research Council (EPSRC).
Has your study been approved by an ethics committee? Not applicable
Ethics committee - additional information: REC review is not required