Thesis of Johan Leydet


Subject:
Understanding Machine Learning Vectors: From Confusion Matrices to Byzantine-Robust Federated Learning

Start date: 18/01/2023
End date (estimated): 18/01/2026

Advisor: Elod Egyed-Zsigmond
Coadvisor: Pierre-Edouard Portier, Sonia Ben Mokhtar, Diana Nurbakova

Summary:

This work studies classification in complex settings. The first part focuses on confusion matrices and aims to characterize model behavior from its outputs. The second part addresses Byzantine-robust federated learning, examining how to defend a distributed system based on local model updates. In both cases, we rely on machine learning vectors—model outputs and gradients—and investigate how to extract meaningful information from them.

The confusion matrix is a key tool for understanding, evaluating, and improving model behavior. It is a square matrix built from label–prediction pairs, using test labels and model predictions. Unlike aggregate metrics such as accuracy, it decomposes errors at the class level and provides a detailed view of model behavior.

In heterogeneous settings, where the label distributions of the training and test sets are imbalanced, the confusion matrix becomes harder to interpret. This imbalance biases the matrix entries and can obscure errors arising from similarities between classes.

Multi-label settings, where observations may belong to several classes, and soft-label settings, where observations are represented by non-negative vectors encoding degrees of membership or confidence, better capture real-world complexity. However, extending the confusion matrix to these settings is not straightforward.

As a result, several competing extensions coexist. Most existing works visualize these matrices on benchmark datasets to interpret model behavior or compare them qualitatively with alternative formulations. However, these comparisons lack a clear quantitative criterion, making it unclear which method should be preferred in practice.

Moreover, no existing approach provides a unified formulation of the confusion matrix for single-label, multi-label, and soft-label settings, limiting both the consistency and the scope of machine learning evaluation.

Our main contributions to the confusion matrix are: -Propose a principled normalization of the confusion matrix to better recover true class similarity in heterogeneous settings; -Establish connections between normalization methods, importance sampling, and latent class representations; -Introduce an optimal transport-based formulation extending confusion matrices to multi-label and soft-label settings, providing a unified framework and linking existing approaches; -Introduce an experimental framework for evaluating whether confusion matrix extensions accurately recover true model confusion in complex classification settings.

Federated learning enables multiple data holders (workers) to collaboratively train a model without sharing their data. Each worker locally updates the global model and sends its update (typically a gradient) to a central server. The server aggregates these updates, updates the model, and broadcasts it back to the workers; this process is repeated at each training round.

In adversarial settings, faulty or malicious (Byzantine) workers may disrupt the training process. Byzantine-robust federated learning aims to mitigate their impact. A common strategy is to discard outlier gradients, based on the assumption that Byzantine gradients deviate more from honest gradients than honest gradients deviate from each other.

However, data heterogeneity—such as label skew, where workers observe different label distributions—also misaligns local updates. As a result, honest workers may appear as outliers, rendering these defense strategies ineffective.

The main contributions on Byzantine-robust federated learning under label skew are: -Introduce a lightweight local sampling-based defense against Byzantine attacks that does not require validation data; -Propose a validation-free Byzantine filter based on the observation that honest updates lie within the convex hull of class-wise gradients.

 


Jury:
M. Elöd Egyed-ZsigmondMaître de conférenceINSA LyonDirecteur(trice) de thèse
Mme Nurbakova DianaMaître de conférenceINSA LyonCo-directeur (trice)
M. Gabriele GianiniProfesseur(e)Université de Milan-BicoccaRapporteur(e)
M. Tommasi Marc Professeur(e)Université de LilleRapporteur(e)
M. Gouy-PaillerDirecteur(trice) de rechercheCEAExaminateur​(trice)