Thesis of Samuele Langhi


Subject:
Efficiently Tracking Inconsistencies over Data Streams

Start date: 01/10/2021
End date (estimated): 01/10/2024

Advisor: Angela Bonifati
Coadvisor: Riccardo Tommasini

Summary:

Continuous querying has emerged as a prevalent paradigm in managing streaming data, yet the study of data quality in such scenarios remains underexplored. In this context, ensuring data consistency is paramount for accurate and trustworthy query results. Traditional approaches to quality-informed query answering in static environments do not seamlessly translate to streaming scenarios. We propose a novel approach, termed consistency-aware query answering, which annotates data with inconsistency degrees instead of directly fixing errors. Leveraging provenance-based annotations and semirings, this approach enables detailed inconsistency tracking without altering the stream itself. However, challenges such as handling unbounded streams and maintaining efficiency persist. Our work addresses these challenges through a two-step process. Firstly, we introduce constraints tailored for streaming data, facilitating consistency analysis over unbounded streams by leveraging their characteristics. We also propose a graph-based approach for efficient inconsistency tracking. Secondly, we design streaming operators capable of preserving the formal guarantees during the annotation process through the integration of provenance management. Our approach offers a promising solution for ensuring data quality in streaming environments.