Thesis of James Sudlow
Subject:
Start date: 03/11/2025
End date (estimated): 03/11/2028
Advisor: Sara Bouchenak
Summary:
Federated learning (FL) is a promising paradigm that is gaining grip in the context of privacy-preserving machine learning for edge computing systems. Thanks to FL, several data owners called clients (e.g., organizations in cross-silo FL) can collaboratively train a model on their private data, without having to send their raw data to external service providers. FL was rapidly adopted in several thriving applications such as digital healthcare [1], that is generating the world’s largest volume of data [2]. In healthcare systems, the problems of privacy and bias are particularly important.
Although FL is a first step towards privacy by keeping the data local to each client, this is not sufficient since the model parameters shared by FL are vulnerable to privacy attack [3], as shown in a line of recent literature [4]. Thus, there is a need to design new FL protocols that are robust to such privacy attacks. Furthermore, FL clients may have very heterogeneous and imbalanced data, which may incur unfair FL model, with disparities among socioeconomic and demographic groups [5][6]. Recent studies show that the use of AI may further exacerbate disparities between groups, and that FL may be a vector of bias propagation among different FL client. In this context, recent works appeared in NDSS [7], and AAAI [8], show that fairness and privacy compete; handling them independently – as done usually – may have negative side-effects on each other.
Therefore, there is a need for a novel multi-objective approach for FL fairness and protection against privacy threats. This is particularly challenging in FL where no global knowledge about statistical information of the overall heterogeneous data is available, a knowledge that is necessary in classical state-of-the-art techniques. This project tackles this challenge and aims to precisely handle the issues raised at the intersection of FL model privacy and fairness, through: (i) Novel distributed FL protocols; (ii) A multi-objective approach to take into account privacy, fairness and utility aspects, these objectives being antagonistic; (ii) Applying these techniques to FL-based digital health use cases.