Thesis of Thierno Diallo


Subject:
Editing Rules for Data Cleaning: Discovery and Application in a Master Data Management Context

Defense date: 17/07/2013

Advisor: Jean-Marc Petit
Coadvisor: Sylvie Servigne

Summary:

Dirty data is a serious problem for businesses, leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. A variety of integrity constraints like Conditional Functional Dependencies have been studied for data cleaning. Data repairing methods based on these constraints are strong to detect inconsistencies but are limited on how to correct data, worse they can even introduce new errors. Based on Master Data Management principles, a new class of data quality rules known as Editing Rules tells how to fix errors, pointing which attributes are wrong and what values they should take. Editing Rules are defined in term of master data, a single repository of high quality data that provides various applications with a synchronized, consistent view of its core business entities.
However, designing data quality rules is an expensive process that involves intensive manual efforts.
In this setting, our first goal is to develop powerful mining techniques to discover Editing Rules. Secondly, we plan to propose efficient data repairing methods based on Editing Rules to clean data in the indistrual context of Orchestra Networks MDM software


Jury:
Mr Laurent DominiqueProfesseur(e)Université Cergy PontoisePrésident(e)
Mme Laure Berti-EquilleDirecteur(trice) de rechercheIRDRapporteur(e)
Mr Bart GoethalsProfesseur(e)Antwerp UniversityRapporteur(e)
Mr Doré MartialOrchestra NetworksEncadrant(e)
Mme Sylvie ServigneMaître de conférenceINSA LyonCo-encadrant(e)
Mr Petit Jean-MarcProfesseur(e)INSA LyonCo-directeur (trice)