HDR of Mehdi Kaytoue
Subject:
Summary:
The process of collecting and analyzing data to answer predictive, explanatory, and decision-making issues has come to be known as “data science” for more than thirty years. Firstly used only by scientists, mainly by statisticians, the term is now widely used in the academics and industrial world. This can be explained in two ways: (i) data is ubiquitous, large, and varied, and (ii) there has been an awareness of the omniscient potential of data. The latter can be economic, societal, scientific, or related to health-care, and is based not only on the data that an entity has, but also on data that it can get (sensors, social networks, open data, etc., freely or not) making the data a black oil that still needs algorithms, methods and methodologies, to be properly refined. One component of data science, Knowledge Discovery in databases (KDD), deals in particular with the Data-Information-Knowledge process with the aim of explaining relationships or discovering hidden properties. Opposed to a purely statistical approach, a family of methods has met an important success over the last twenty years: data-mining and especially pattern-mining. Their goal is to describe, summarize, raise hypotheses from data. In particular, pattern mining makes it possible to efficiently find regularities of various types (such as frequent patterns in a set of transactions, molecular sub-graphs characteristic of toxicity, locally co-expressed gene groups, etc.). In fact, where conventional approaches aim to validate or invalidate an hypothesis given a priori, the search of patterns is seen as an enumeration technique of all the possible hypotheses (a set of exponential size w.r.t the input data) verifying some given constraints or maximizing a certain interest for the expert. Once discovered, the best hypotheses can then be tested, validated or invalidated and ultimately validated as knowledge unit. My scientific adventure began with the study of a binary relationship, very often illustrated by supermaket transaction data, linking customers and products they buy. How to make this relationship speak? What knowledge, behavioral habits, recommendations, etc. can we characterize? This initial question allowed me to travel through different application fields (biology, neuroscience, social networks and video games analytics), seeking to implement or adapt data mining methods to try to understand some phenomena while properly formalizing data and patterns in the most rigorous way. This is the story of this manuscript, according to three main research axes: the formalism framing the methods (Formal Concept Analysis), the methodological and algorithmic aspects related in Data mining, and finally the Knowledge Discovery “in practice” through several concrete applications encountered during collaborations with other scientists or industrial partners.
Defense date: wednesday, february 12, 2020
Jury:
Dr. Karell Bertet | Maître de conférence | Université de la Rochelle | Rapporteur(e) |
Dr. Florent Masseglia | Directeur(trice) de recherche | INRIA | Rapporteur(e) |
Pr. Christel Vrain | Professeur(e) | Université d’Orléans | Rapporteur(e) |
Pr. Michael Berthold | Professeur(e) | Universität Konstanz (Allemagne) | Examinateur(trice) |
Pr. Angela Bonifati | Professeur(e) | Université Claude Bernard Lyon 1 | Président(e) |
Pr. Jean-François Boulicaut | Professeur(e) | INSA Lyon | Examinateur(trice) |
Pr. Johannes Fürnkranz | Professeur(e) | Universität Linz (Autriche) | Examinateur(trice) |
Dr. Amedeo Napoli | Directeur(trice) de recherche | CNRS | Examinateur(trice) |