DaQuaTa International Workshop 2017

Lyon, France - December, 11-12 , 2017

DaQuaTa International Workshop 2017

Angela Bonifati and Bastien Rance

Abstract:

Hospitals and life-science institutes produce a tremendous amount of data on a daily basis during the healthcare process and ordinary scientific activity. Such data is highly valuable to improve the process of care delivery and prevention and can also play a pivotal role in prospective clinical research. However, clinical, biological and imaging data are usually gathered by means of diverse data collection channels and procedures exhibiting a diverse degree of reliability and trustability.  As a consequence, the collected data is usually scattered over heterogeneousdata sources and suffers from quality problems that hampers its use for analysis purposes.

In this talk, we will present an empirical study on biological data series revealing classical data quality issues such as missing data and outliers. We will observe that the distribution of data can evolve over time creating unconventional “distribution-glitches” than can cause interpretation errors of high severity.