Thesis of Roland Kotto Kombi

Distributed Query Processing On Data Streams Adapted to User Communities

Defense date: 31/12/2018

Advisor: Philippe Lamarre
Coadvisor: Nicolas Lumineau


In the context of Big Data, there is an emergence of techniques for storing, partitioning and querying huge amount of data. All those techniques rely on distributed and highly parallel approaches. Nevertheless, the evolution does not only concern data management but also data sources. Actually, people produces more and more data streams and emits complex queries on them. Data streams can be conceptualized as infinite sequence of elements. A stream element is a pair and there can be many stream elements sharing a same timestamp. On the other side, queries on data streams, called continuous queries, can be represented as DAG. Each vertex represents an atomic and potentially parallel operator and edges define the succession of operators on data. The main difference with queries on static datasets is that continuous queries never ends. The objective of my PhD work, included in the ANR project SOCIOPLUG, is to design a distributed Data Stream Management System (DSMS) executing continuous queries on data streams according to the following standards :

-The ability to process complex queries on data streams without breaks and considering latency and quality constraints.
-The respect of support limitations especially memory capacity and disk absence.
-The ability to dynamically self-reconfigure query operators execution during runtime, denoted online elasticity, in order to consume only resources that are necessary with regards to execution environment evolution and reconfiguration costs.