Thesis of Ahmed Anes Bendimerad

Mining Useful Patterns in Attributed Graphs


In this thesis, we address the problem of pattern discovery in vertex-attributed graphs. This kind of structure consists of a graph augmented with attributes associated to vertices. Vertex-attributed graphs provide a powerful abstraction that can be used to represent many datasets in an intuitive manner. Mining these graphs can be very useful for many applications, such as analyzing social networks, biological networks, the World Wide Web, etc. Several methods have been proposed to identify patterns in these structures. Generally, these methods define a pattern as a subgraph whose vertices satisfy some structural constraints (e.g., density, connectivity) and have a subset of attributes with homogeneous values. 
When mining vertex-attributed graphs, the principled integration of both graph and attribute data poses two important challenges. First, we need to define a pattern syntax (the abstract form of patterns) that is intuitive and lends itself to efficient search. A pattern being intuitive means that it can be easily interpreted and assimilated by the user. Considering that a pattern is generally defined over a subgraph, a pattern can be often huge in terms of vertices, which makes it difficult to grasp. Thus, the assimilation cost of a pattern is an important question that needs to be addressed. The second challenge is the formalization of the pattern interestingness. A pattern is generally relevant if it depicts some local properties that are somehow exceptional, otherwise, it will be already expected from the overall properties of the graph. Furthermore, the interestingness of patterns is subjective in practice, i.e., it significantly depends on the final user, her background knowledge and her preferences. A user would consider that a pattern is useful if it brings some new knowledge to her, especially if this pattern informs about some features or topics that usually interest this user. Another common problem related to the interestingness of patterns is the redundancy issue in the result set. In other terms, a data mining approach may return a set of patterns that give redundant information, because these patterns cover very overlapping parts of vertices and attributes. Information redundancy can be also due to some semantic relation between different attributes, such as attribute hierarchies. For example, knowing that a community of a social network is characterized by a high interest in ``rock music'' makes it less informative that it also  has a high interest in ``music'', because ``rock music'' is a subtype ``music''. Consequently, the quality of patterns depends on many different factors.
We address these challenges for the problem of mining attributed graphs. More precisely, we first introduce the task of discovering exceptional attributed subgraphs, which is rooted in the Subgroup Discovery framework. The goal is to identify connected subgraphs whose vertices share characteristics that distinguish them from the rest of the graph. Then, we propose methods that aim to take into account the user and the domain knowledge when assessing the interestingness of patterns. We design a method that makes it possible to incorporate user's background knowledge and pattern's assimilation cost. This method is able to identify patterns that are both unexpected (thus informative) and easy to interpret. To ease the assimilation, alternative descriptions of exceptional attributed subgraphs are provided. Furthermore, we propose another graph mining approach that integrates user's preferences. This method exploits an interactive process with the user to bias the pattern interestingness. It has been defined for the task of geo-located event detection in social media. Then, we design an approach that is able to incorporate hierarchical attribute dependencies into the pattern interestingness, which allows to avoid redundancy related to this kind of semantic relations between attributes. In other terms, when the attributes are organized as a hierarchy, this method is able to account for the inference that the user would make about some attribute values when she is informed about values of other attributes. Finally, we conclude this thesis by discussing some research perspectives.

Advisor: Céline Robardet
Coadvisor: Marc Plantevit

Defense date: thursday, september 5, 2019

Aristides GionisProfesseur(e)Aalto UniversityRapporteur(e)
Marie-Christine RoussetProfesseur(e)Université Grenoble AlpesRapporteur(e)
Tijl De BieProfesseur(e)Ghent UniversityExaminateur​(trice)
Alexandre TermierProfesseur(e)Université de Rennes 1Président(e)
Siegfried NijssenMaître de conférenceUniversité Catholique de LouvainExaminateur​(trice)
Céline RobardetProfesseur(e)INSA-LyonDirecteur(trice) de thèse
Marc PlantevitMaître de conférenceUniversité Claude Bernard Lyon 1Co-directeur (trice)