Utilization of Large Healthcare Data for Epidemiology, Business, and Health Informatics

Author: Duncan Yeung

The Information Systems Department (ISD) at CHOC Children’s obtained the Cerner Health Facts® Database consisting of de-identified electronic health records of over 480 facilities throughout USA. The database consists of 3 major types of patient: inpatient, emergency room, and outpatient. The current version of the database is estimated to be over 4TB large bringing into question the nontrivial task of assimilating and integrating it with our current infrastructure. The problem of assimilating this data encompasses both Computer Systems and Data Analytics considerations. In this presentation, we discuss challenges in setting up and utilizing the Cerner Health Facts® database. We discuss the problems of unzipping the raw flat files of structured data within our servers; restructuring of the database schema to reduce disk usage requirement; decisions on computer processing systems to ensure seamless querying and retrieval of data; and provision of appropriate statistical and machine learning systems for analyzing this dataset for epidemiological, business, and medical informatics studies. We are concurrently evaluating secure cloud solutions for both storage and analysis of the data.

Co Author/Co-Investigator Names/Professional Title: William Feaster, MD, MBA, CMIO, Neil Garde, Louis Ehwerhemuepha, PhD