Thesis of Abd Errahmane Kiouche

Graph comparaison and massive data


Graph comparison, also known as graph matching is a very active area of ​​research that knows several investigations in various domains: information retrieval, pattern recognition, databases, etc. Known solutions can be classified as exact or approximate according to the expected result of the comparison. Exact methods allow to find the result graphs that are exactly similar to the query graph. Approximate methods make it possible to calculate a sort of distance that closely reflects the degree of similarity between the compared graphs. This makes it possible to have the closest  results to the query graph when one has several graphs to compare with. This is called the top k (as with a search engine).

The algorithmic approaches to compare two graphs, especially in the exact case, have an exponential behavior and become difficult to implement when the size of the graphs becomes really important. A similar problem arises even in the case of approximate methods. As the size of data graphs or the number of graphs in a  database grows, it becomes critical to offer easy-to-implement and scalable comparison tools. This is the goal of this thesis.


Various approaches may be considered. One of the most effective approaches is to decompose the two graphs to be compared into small pieces, to compare the pieces and then to deduce a result for the comparison of the two initial graphs either by joining the intermediate results or by a distance calculation. The comparison of graphs through decomposition can be used in the exact case as well as in the approximate case. However, finding the best decomposition which makes it possible to reduce the number of joins necessary for obtaining the final result, in particular in the exact case, is a challenging issue. Another, more recent approach is the way graphs and nodes are represented. It turns out that with a suitable representation (more compact, simpler, etc.), we can considerably simplify the comparison of graphs. There are also solutions that use specialized platforms for big data processing like MapReduce and Pregel.


The aim of this thesis is to study the algorithmic around the comparison of graphs and to propose comparison and matching solutions that can be used efficiently in the case of large graphs or a large number of graphs. We will mainly consider data models that are adapted for big graphs such as graph streams and several application domains.


Advisor: Hamida Seba