Thesis of Chemseddine Nabti

Subject:

Graphs Structures and Algorithms for big data

Start date: 01/09/2014
Defense date: 15/12/2017

Advisor: Hamida Seba

Summary:

Big Data is an emerging paradigm that has gained a significant interest from both academia and Industry. Nowadays, several applications and domains such as high throughput instruments, Observation and monitoring systems, simulation models as well as new applications (e.g., social Networks, sensors technologies, Internet of things, etc.) Produce large amount of data that have to be collected, communicated and processed.
Graphs are in the core of the big data issues related to data sensing, data processing and data communications. As an effective way of formalizing problems and representing objects, graphs are used to represent the complex and heterogeneous linked data of the big data challenge.
In fact, data collection and communication rely on routes computation and optimization on network topologies represented by graphs and Linked Data processing relies on graph algorithms that aim to compare search or mine complex structured data.
In this thesis, we focus on big graph data by exploring the properties and parameters of these graphs that can be used to simplify their processing. To do so, we rely on a three layer framework based on graph substructures. In this framework, we aim to simplify graph processing algorithms so as they can be executed on single machines in acceptable times using the following three layers to design the algorithms:
Layer 1: Graph specific representations. How to represent efficiently large graphs so as to retain their properties and minimize the amount of memory required for their storage? The representation is mainly based on graph substructures.
Layer 2: Representation-oriented processing. How to adapt query processing and mainly traditional graph algorithms such as search, compare and mine on these representations or to their properties?
Layer 3: Representation-inferring. Do these graph representations highlight new properties on which we can rely to achieve more performance?
As an application domain, we intend to work on two main kinds of large graphs:
1. Social network analysis where the end is to analyze interactions and search for communities or particular interconnections.
2. Resource discovery in large scale connected objects on the internet of thing. Here, the aim is to use linked data properties to infer routes in the labyrinth of billion connected objects.