Thesis of Thierno Diallo
Subject:
Start date: 14/12/2009
Defense date: 17/07/2013
Advisor: Jean-Marc Petit
Coadvisor: Sylvie Servigne
Summary:
Dirty data is a serious problem for businesses, leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. A variety of integrity constraints like Conditional Functional Dependencies have been studied for data cleaning. Data repairing methods based on these constraints are strong to detect inconsistencies but are limited on how to correct data, worse they can even introduce new errors. Based on Master Data Management principles, a new class of data quality rules known as Editing Rules tells how to fix errors, pointing which attributes are wrong and what values they should take. Editing Rules are defined in term of master data, a single repository of high quality data that provides various applications with a synchronized, consistent view of its core business entities.
However, designing data quality rules is an expensive process that involves intensive manual efforts.
In this setting, our first goal is to develop powerful mining techniques to discover Editing Rules. Secondly, we plan to propose efficient data repairing methods based on Editing Rules to clean data in the indistrual context of Orchestra Networks MDM software
Jury:
Mr Laurent Dominique | Professeur(e) | Université Cergy Pontoise | Président(e) |
Mme Laure Berti-Equille | Directeur(trice) de recherche | IRD | Rapporteur(e) |
Mr Bart Goethals | Professeur(e) | Antwerp University | Rapporteur(e) |
Mr Doré Martial | Orchestra Networks | Encadrant(e) | |
Mme Sylvie Servigne | Maître de conférence | INSA Lyon | Co-encadrant(e) |
Mr Petit Jean-Marc | Professeur(e) | INSA Lyon | Co-directeur (trice) |