Cloud Computing & Big Data

Author: Cedric Manlhiot

Coauthor(s): Cedric Manlhiot, Joe Duhamel, Carl Virtanen, Edgar Crowdy, Kate Westcott, Heather J. Ross

Status: Work In Progress

Design and architecture of a novel health data integration platform to empower clinical research, computational biomedicine and individualization of care

For historical, technological and administrative reasons, the majority of health data remains highly siloed and difficult to access, integrate or utilize. Even in modern electronic medical records, free text is still dominant, there are significant limits to information flow, restricted connectivity to the outside world, poor integration of external data sources and insufficient analytics capabilities. This is a significant impediment to the implementation of individualized medicine protocols and to the routine deployment of predictive analytics in clinical care.

The Ted Rogers Centre for Heart Research Computational Biomedicine (TRCHR CB) Platform has been designed specifically to address this problem by providing a data integration and analytic platform that is deployed and functions in parallel to the electronic medical records system. Through a decentralized architecture, data is de-identified, linked, converted to a standard format and encrypted at the source before it ever leaves the data contributor infrastructure. Using technology-agnostic protocols for ingesting data; the TRCHR CB platform will be able to consume data from disparate data sources including electronic medical records, imaging, genomics, data from physiological monitoring devices and eventually from wearable devices and mHealth applications. Data ingested through the platform is stored in a common data lake hosted on a private, high-performance, compute Cloud to allow for distributed computing and computationally-intensive analytics. Finally, a project management system will be implemented to ensure data security, enforce data governance and protect patient privacy. Best practices and privacy by design frameworks were used to determine policies and procedures used to operate the platform. Through a set of application programming interfaces (APIs), clinical applications will be able to both submit information to the platform and consume the information available in real-time. Web services will then be used to submit clinical information to predictive algorithms, both using classic biostatistical and machine learning methods, and generating patient-level predictions at the bedside to individualize patient care.

Through this novel data integration platform, we hope to provide a technical solution for the problem of siloed health data, expand the types of data sources that can be used for clinical analytics and provide the infrastructure to bring predictive analytics to the bedside. In doing so, we will close the technological gap in the delivery of individualized medicine without requiring a major evolution of electronic medical records platforms.