Author: Andrew Goodwin
Physiological data streams are increasingly being collected and stored for research purposes. The CCU at SickKids records over 250,000 patient-hours of high frequency waveforms every year. These data streams are collected in a clinical environment where long term storage is not the primary goal, and as a result may be subject to issues that adversely affect the suitability of the data for research purposes. Such physiological data streams data are, in a way, a by-product of the clinical experience. Researchers are often discovering that their retrospective studies, which were approved based on apparent data availability, are provided information which is not of sufficient quality or completeness to effectively test their hypothesis. Valuable research time may be spent assessing and vetting physiological data. The challenge for physiological database managers is how to ensure archives of high quality information can be made available to researchers in a clinical setting where “perfect” data collection is impossible. Archived physiological data may be discontinuous, imprecise, and/or inaccurate. Discontinuities may result from removal of monitor leads, networking issues, patients undergoing a procedure or moving from bed to bed; imprecision may be present due to the operational limitations of the monitoring devices; and inaccuracies in timing and measurements may be introduced by poorly calibrated equipment. Furthermore, properties of the data streams may be affected by device settings, for example adjustments made for visualization or operational reasons may be reflected in the output data streams (data may be smoothed, clipped, filtered, etc.). Collected data may also be irreversibly affected by upstream processing, for example limitations of the inbuilt beat detection algorithms used in patient monitors may momentarily affect reported heart rate values. Our approach is to faithfully collect and store all data in the form in which it was provided by the medical devices, including its flaws and inconsistencies. Algorithms have been developed to automatically detect and characterize known data quality issues. These quality metrics are stored and indexed alongside the raw data so that they may act as a guide for researchers requesting information from the database. Data quality issues uncovered in the future may be retrospectively characterized and applied to the database as additional indexes. In concert with this approach we periodically audit all monitoring devices in the unit, assessing their precision and accuracy for both reported values and timing. We use this information to continuously look for opportunities to improve clinical data collection practices. As a result, our database has the ability to provide a report that can be used to determine whether it contains information of sufficient quality and quantity to test a given data driven physiological hypothesis. Indeed, we hope to define the meaning and relevance of “data quality” in this context. Combined with other capabilities of the system, this data framework will serve as an optimal data source for high performance physiological modelling and automated machine learning approaches.
Co Author/Co-Investigator Names/Professional Title: Andrew Goodwin, B.Eng Robert Greer, B.Eng Dr. Danny Eytan, MD, PhD Anirudh Thommandram, B.Eng, M.A.Sc. Dr. Peter Laussen, MB.BS., FCICM
Funding Acknowledgement: SickKids Foundation: David and Stacey Cynamon Chair in Pediatric Critical Care, The Hospital for Sick Children and University of Toronto