Thesis of Usman Ahmed


Subject:
Dynamic Cubing for Hierarchical Multidimensional Data Space

Defense date: 18/02/2013

Advisor: Maryvonne Miquel
Coadvisor: Anne Tchounikine

Summary:

Data warehouses are being used in many applications since quite a long time. Traditionally,
new data in these warehouses is loaded through offline bulk updates which
implies that latest data is not always available for analysis. This, however, is not acceptable
in many modern applications (such as intelligent building, smart grid etc.)
that require the latest data for decision making. These modern applications necessitate
real-time fast atomic integration of incoming facts in data warehouse. Moreover,
the data defining the analysis dimensions, stored in dimension tables of these warehouses,
also needs to be updated in real-time, in case of any change. In this thesis,
such real-time data warehouses are defined as dynamic data warehouses. We propose
a data model for these dynamic data warehouses and present the concept of Hierarchical
Hybrid Multidimensional Data Space (HHMDS) which constitutes of both ordered
and non-ordered hierarchical dimensions. The axes of the data space are non-ordered
which help their dynamic evolution without any need of reordering. We define a
data grouping structure, called Minimum Bounding Space (MBS), that helps efficient
data partitioning of data in the space. Various operators, relations and metrics are
defined which are used for the optimization of these data partitions and the analogies
among classical OLAP concepts and the HHMDS are defined. We propose efficient
algorithms to store summarized or detailed data, in form of MBS, in a tree structure
called DyTree. Algorithms for OLAP queries over the DyTree are also detailed. The
nodes of DyTree, holding MBS with associated aggregated measure values, represent
materialized sections of cuboids and tree as a whole is a partially materialized and
indexed data cube which is maintained using online atomic incremental updates. We
propose a methodology to experimentally evaluate partial data cubing techniques
and a prototype implementing this methodology is developed. The prototype lets us
experimentally evaluate and simulate the structure and performance of the DyTree
against other solutions. An extensive study is conducted using this prototype which
shows that the DyTree is an efficient and effective partial data cubing solution for a
dynamic data warehousing environment.