Thesis of Orhan Yazar
Multi-target regression (MTR) has attracted an increasing amount of attention in recent years. The main challenge in multi-target regression is to create predictive models for problems with multiple continuous targets by considering the inter-target correlation, which can greatly influence the predictive performance. MTR emerges in several modern applications including ecology, biophysical and medicine.
There is a thing that most of existing methods, namely the impact of inputs in target correlations (i.e., conditional target correlation). In this thesis, we first propose a novel MTR framework, termed as Conditionally Decorrelated Multi-Target Regression (CDMTR). CDMTR learns from the MTR data following three elementary steps: clustering analysis, conditional target decorrelation and multi-target regression models induction. The clustering step aims to investigate the underlying properties of training data for decomposing the original MTR problem into several MTR sub problems. The goal is to effectively capture correlations in the input-feature space to facilitate the subsequent discrimination process. In the second step, CDMTR conducts, in each given cluster, a principal component analysis (PCA) of the target space for deriving linear combinations of the targets. Subsequently, the transformed targets (i.e., the principal components) are used in a simple single-target regression method that does not have to care about conditional target dependencies, knowing that the transformed targets are uncorrelated in each clustering partition.
Through this approach, we demonstrate that the benefit of exploiting conditional target dependencies in MTR can greatly influence the generalization performance but is known to be closely dependent on the properties of the data and the type of loss to be minimized. Indeed, in MTR data where many inter-dependencies between the targets may be present, explicitly modeling all inter-target and input-output relationships is intuitively far more reasonable. In a second part of this thesis, the multi-target regression and optimal feature subset selection problems were formulated within a unified probabilistic framework, termed as Conditionally Independent Target Subsets (CITS). It consists of using the power of Bayesian networks to explicitly identify different conditionally independent target subsets and their optimal set of predictors to improve the multi-target regression training process.
Satisfactorily tested on several benchmark data sets, the approaches developed in this thesis show promise compared to competitive state-of-the-art alternatives. Extensive experiments are also conducted on the Panzani industrial database for assessing discount campaigns in the Agri-food industry.
Advisor: Mohand-Said Hacid
Coadvisor: Haytham Elghazel
Defense date: friday, june 11, 2021
|Mr Bennani Younes||Professeur(e)||Université Sorbonne Paris Nord||Rapporteur(e)|
|Mme Kuntz Pascale||Professeur(e)||Université de Nantes||Rapporteur(e)|
|Mme Amer-Yahia Sihem||Directeur(trice) de recherche||CNRS Grenoble||Examinateur(trice)|
|Mr Benabdeslem Khalid||Maître de conférence||Université Lyon 1||Examinateur(trice)|
|Mr Hacid Mohand-Saïd||Professeur(e)||Université Lyon 1||Directeur(trice) de thèse|
|Mr Elghazel Haytham||Maître de conférence||Université Lyon 1||Co-directeur (trice)|
|Mme Castin Nathalie||Responsable industriel, Panzani||Invité(e)|