Thesis of Guillaume Bosc

Formalization and implementation of massive and heterogeneous data mining heuristic methods

Defense date: 11/09/2017

Advisor: Jean-Francois Boulicaut
Coadvisor: Mehdi Kaytoue


This PhD focuses on the development and testing of new generic pattern mining methods that are mathematically well defined. Given a pattern language and a database defined on the same language, the task is to extract the complete, correct and non redundant set of patterns respecting some user-defined constraints. However, current pattern mining methods are not yet suitable as is in the context of both massive and heterogeneous data due to strict correctness and completeness constraints that the collection of patterns should verify. This is not possible in the era of big data.
As such, we propose to study the problem of patterns enumeration with a new point of view: (i) a heuristic traversal of the search space, but with guarantees, and (ii) a joint use of pattern languages of different expressiveness.