Thesis of Donato Tiano

Learning Models on Healthcare Data with Quality Indicators

Defense date: 08/12/2022

Advisor: Angela Bonifati


Time series are collections of data obtained through measurements over time. The purpose of this data is to provide food for thought for event extraction and to represent them in an understandable pattern for later use. The whole process of discovering and extracting patterns from the dataset is carried out with several extraction techniques, including machine learning, statistics, and clustering. This domain is then divided by the number of sources adopted to monitor a phenomenon. Univariate time series when the data source is single and multivariate time series when the data source is multiple. The time series is not a simple structure. Each observation in the series has a strong relationship with the other observations. This interrelationship is the main characteristic of time series, and any time series extraction operation has to deal with it. The solution adopted to manage the interrelationship is related to the extraction operations. The main problem with these techniques is that they do not adopt any pre-processing operation on the time series. Raw time series have many undesirable effects, such as noisy points or the huge memory space required for long series. We propose new data mining techniques based on the adoption of the most representative features of time series to obtain new models from the data. The adoption of features has a profound impact on the scalability of systems. Indeed, the extraction of a feature from the time series allows for the reduction of an entire series to a single value. Therefore, it allows for improving the management of time series, reducing the complexity of solutions in terms of time and space. FeatTS proposes a clustering method for univariate time series that extracts the most representative features of the series. FeatTS aims to adopt the features by converting them into graph networks to extract interrelationships between signals. A co-occurrence matrix merges all detected communities. The intuition is that if two time series are similar, they often belong to the same community, and the co-occurrence matrix reveals this. In Time2Feat, we create a new multivariate time series clustering. Time2Feat offers two different extractions to improve the quality of the features. The first type of extraction is called Intra-Signal Features Extraction and allows the acquisition of features from each signal of the multivariate time series. Inter-Signal Features Extraction is used to obtain features by considering pairs of signals belonging to the same multivariate time series. Both methods provide interpretable features, which makes further analysis possible. The whole time series clustering process is lighter, which reduces the time needed to obtain the final cluster. Both solutions represent the state of the art in their field. In AnomalyFeat, we propose an algorithm to reveal anomalies from univariate time series. The characteristic of this algorithm is the ability to work among online time series, i.e. each value of the series is obtained in streaming. In the continuity of previous solutions, we adopt the functionality of revealing anomalies in the series. With AnomalyFeat, we unify the two most popular algorithms for anomaly detection: clustering and recurrent neural networks. We seek to discover the density area of the new point obtained with clustering.

M. Aussem AlexandreProfesseur(e)LIRIS Université Claude Bernard Lyon 1Examinateur​(trice)
Mme Zeitouni KarineProfesseur(e)Université Paris-SaclayRapporteur(e)
M. Boucelma OmarProfesseur(e)Université Aix-Marseille Rapporteur(e)
Mme Ben Mokhtar SoniaDirecteur(trice) de rechercheLIRIS - CNRS UMR 5205 - LyonPrésident(e)
Mme Dumbrava Stefania GabrielaMaître de conférenceENSIIE ParisExaminateur​(trice)
Mme Bonifati AngelaProfesseur(e)LIRIS Université Claude Bernard Lyon 1Directeur(trice) de thèse
M. Bifet AlbertProfesseur(e)Université de Waikato & Telecom ParistechInvité(e)