Thesis of Pascal Wehrle


Subject:
Data Warehouses on Computing Grids

Defense date: 30/09/2007

Advisor: Robert Laurini
Coadvisor: Maryvonne Miquel

Summary:

In order to keep up with the constant increase in volume of data to be stored and the growing complexity of data analysis, data warehouses are deployed on powerful distributed systems these days. Computing grids have experienced a strong development for some years and work on middleware has provided solutions through its utilization aimed at distributed computing. Much work however remains concerning management and analysis of shared multidimensional data on the grid. Data warehouses haven't been deployed on computing grids yet and multiple problems of managing data in this context (dynamicity, tracing, efficient access) must be resolved. We propose a approach which is specific to computing grids, an architecture particularly well adapted to the absence of central control instances. The warehouse data is distributed on the grid and managed autonomously by the grid nodes. In particular, modelling and construction of the distributed data warehouse on a computing grid, publishing of available and/or computable data and the execution of distributed OLAP queries on this kind of virtual data warehouse are studied in this work. We propose methods for indexing and query execution that allow for efficient operation of data warehouses distributed on grids. This research is applied within the GGM (Grid for Geno-Medicine) project of the French Ministry for Research ACI Masse de Données to the design of a distributed data warehouse for geno-medical data on a computing grid.