Thesis of Abd Errahmane Kiouche


Subject:
Graph comparison and massive data: approach by decomposition, compression and sampling

Defense date: 09/11/2021

Advisor: Hamida Seba
Cotutelle: Karima Amrouche

Summary:

Graphs are mathematical structures constituted of vertices and edges that link or connect the graph vertices. Due to their powerful properties, graphs are used to model data in numerous real-world fields: sociology, transport, computer networking, biology, cheminformatics, image processing, etc. In many of these applications, there is a crucial need for an accurate and fast quantitative measure of the similarity between graphs. However, computing an accurate similarity between two graphs in a short time is a challenging problem for many reasons. Firstly, the size and the number of the generated graphs are increasing exponentially and continuously, and it is increasingly common to find large graphs everywhere. Secondly, with the appearance of dynamic graphs, in several real-world applications, the number of required comparisons to be made between graphs has increased considerably.  In this thesis, we investigate the strategies and mechanisms that allow us to speed-up and simplify graph comparison in different real-world fields. The main purpose is to propose new representations and similarity measures between graphs that scale-up to large graphs. The investigated strategies and mechanisms include representations enabling the comparison of graphs in the data stream model, the decomposition of large graphs to be able to compare them more easily, the simplification of graphs either by compression, sampling, or embedding them into vectors. As result, we propose 3 contributions :  The first one is based on a new incremental graph embedding to compare graphs in the streaming model. This  graph embedding is based on graph decomposition into substructures and graph edit distance. The main strength of this approach is that it is fast and allows to update incrementally and fastly the obtained graph vectors. We applied this approach successfully to the problem of anomaly detection in a graph stream. In our second contribution, we propose a graph dissimilarity measure that relies on the mechanism of graph sparsification and a new node embedding that captures the topological neighborhood of nodes. The role of the sparsification mechanism is to reduce the number of nodes in the compared graphs and thus reduce the overall comparison time.  We applied this comparaison technique to the problem of 2D pattern recognition where images are represented by graphs. Our third contribution is a graph compression method that preserves a proportion of the neighbors of each node in the graph.  The main aim of this approach is to simplify large graphs while preserving as many of vertices neighborhoods properties as possible to speed-up the graph comparison time. This compression has proven to be very useful in many applications where graph comparison is required.

We implemented the proposed approaches and condicted extensive tests and experimentations on various datasets derived from many applications to show the effectiveness of the employed mechanisms in speeding-up and simplifying graph comparison.

 


Jury:
M. STATE RaduProfesseur(e)SnT, Université du LuxembourgRapporteur(e)
M. TARI AbdelkamelProfesseur(e)ESTINRapporteur(e)
Mme. BENATCHBA KarimaProfesseur(e)Ecole Supérieure d'Informatique (ESI)Examinateur​(trice)
M. HACID Mohand SaidProfesseur(e)Université Lyon 1Examinateur​(trice)
Mme. AMROUCHE KarimaMaître de conférenceEcole Supérieure d'Informatique (ESI)Co-encadrant(e)
Mme. SEBA HamidaMaître de conférenceUniversité Lyon 1Directeur(trice) de thèse