Development and validation of a natural language classifier to automatically assign MRI abdomen/pelvis protocols from free-text clinical indications

Author: Jae Ho Sohn

Background: An important part of radiology workflow is assigning protocols for clinical imaging studies. In most hospitals, clinicians order medical imaging exam(s) along with a free-text description of why the patient needs the imaging and what pathology they are looking for. The process of assigning imaging positions, intravenous and oral contrast type, timing of imaging after contrast administration, and specific imaging parameters is called protocoling. Accurate and timely assignment of protocols is important for patient safety and hospital efficiency because inaccurate protocol will hinder radiologists from making accurate diagnosis and introduce inefficiency in workflow. MRI protocols, especially on abdomen and pelvis, tend to be more subjective and variable because it involves more complex medical physics and more number of contrasts that radiologists are often less familiar with. The aim of this study is to develop and validate an artificial intelligence based natural language classifier using IBM Watson (the Jeopardy answering machine) to automatically assigning MRI abdomen/pelvis protocols based on free-text clinical indications. Methods: 253 free-text clinical indications and final assigned MR abdomen & pelvis protocols were retrospectively retrieved from an in-house radiology communication tool. Each entry was manually confirmed to ensure dissociation from any protected health information. The final assigned MR protocols were divided into 14 protocol categories by consensus among authors. The dataset was split into 170 training set and 83 test set. Supervised machine learning was performed via the Watson Bluemix service, which undergoes series of string analysis, hypothesis generation, hypothesis & evidence scoring, and final merging & ranking. The final natural language classifier was evaluated with the test set. Incorrectly classified cases were examined for any consistent error pattern. Results: Three most common indications for MR of abdomen and/or pelvis were evaluations for choledocholithiasis (n=67, 26.5%), appendicitis (n=41, 16.2%), and liver mass (n=24, 9.5%). Most common protocols were MRI/MRCP abdomen with gadoxetate disodium protocol (n=92, 36.4%). Training time for generating the natural language classifier took 12 minutes and 43 seconds. Testing time took <1 second. The final classifier had an overall accuracy rate of 93% (77 out of 83). The 6 cases with incorrect classification all had relatively rare clinical indications that were not well-represented in the training set. An example includes a study with clinical indication of “pt pregnant so plan to avoid ct, eval for poss incarcerated ing hernia” which was incorrectly classified as “MR appendicitis protocol” instead of the “MR pelvis without contrast.” Conclusion: We successfully developed and validated a natural language classifier to serve as a clinical decision support tool for automatically assigning MRI abdomen and/or pelvis protocols, achieving an accuracy rate of 93%.

Co Author/Co-Investigator Names/Professional Title: 1. Fouad Al-Adel (Radiology Fellow, Radiology & Biomedical Imaging, UCSF Medical Center) 2. Joseph Mesterhazy (Software Engineer, Radiology & Biomedical Imaging, UCSF Medical Center)