Thèse de Mauro Famà


Sujet :
PolyFlow: Multimodel Streaming Data Management

Date de début : 01/05/2024
Date de fin (estimée) : 01/05/2027

Encadrant : Riccardo Tommasini

Résumé :

The need for addressing data variety has paved the road to the advent of multi-databases [1]. Multistores and polystores [2] expose a query interface over heterogeneous data model, aiming to reduce the amount of ETL (Extract-Transform-Load) jobs required to have a uniform view over heterogeneous data [3]. The progress in multi-model data management and the rise of streaming data suggest that times are ripe for a paradigmatic change that enables multi-model and polyglot streaming data management.
PolyFlow aims to build a new generation of data systems, namely polystreaming systems, that elect the DSMSs as processing run-time. In the abstract Polystream architecture, an integrated continuous query processor (as opposed to a traditional polystore query processor) is loosely coupled to transient data views over heterogeneous streaming data. Over time, such views are maintained using DSMSs that support several streaming languages. Data integration techniques appropriately adapted to operate over transient data, consent cross model mapping, and maintenance. In practice, the DSMSs act as wrappers able to expose time-varying views upon request by processing heterogeneous input streams.
The project concentrates the formalization efforts around the integration of stream processing languages and their underlying data abstractions. On the other hand, the PolyFlow vision embraces a declarative paradigm for data management. Thus, the project shall investigate the means for obtaining efficient processing plans.[1]    Jiaheng Lu and Irena Holubová. “Multi-model Databases: A New Journey to Handle the Variety of Data”. In: ACM Comput. Surv. 52.3 (2019), 55:1–55:38. doi: 10.1145/3323214. url: https://doi.org/10.1145/3323214.
[2]    Philipp Marian Grulich, Steffen Zeuch, and Volker Markl. “Babelfish: Efficient Execution of Polyglot Queries”. In: Proc. VLDB Endow. 15.2 (2021), pp. 196–210. url: http://www.vldb.org/pvldb/vol15/p196-grulich.pdf.
[3] Michael Stonebraker and Ugur Çetintemel. “"One size fits all": an idea whose time has come and gone”. In: Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker. Ed. by Michael L. Brodie. ACM / Morgan & Claypool, 2019, pp. 441–462. doi: 10.1145/3226595.3226636