Thesis of Kamel Taouche


Subject:
Aggregated query processing in a large-scale distributed environment

Defense date: 30/09/2017

Advisor: Mohand-Said Hacid
Coadvisor: Emmanuel Coquery

Summary:

Due to the decentralized architecture of the Linked Open Data (LOD), answering complex queries often requires accessing multiple data sources and combining the information they return. This thesis focuses on the evaluation of SPARQL query on the LOD. The processing of these queries requires a communication with multiple remote sources because relevant information can be spread across several sources on the Web of data (WoD). To build the final response, a combination of intermediate results returned by each of these sources is necessary. In this context, this work aims to design a system that will have as input a SPARQL query, and returns in response a result produced by the aggregation of several fragments from various sources. All the query management process should remain transparent user assuming he doesn’t need to specify the sources that may contain the answer to his query. Such a system has to solve two major challenges: on one hand, the selection of potentially relevant sources containing the expected information. On the other hand, the redundancy and data overlap when combining the results due to the existence of the same information in several sources.