Thèse de Fatima El Hattab


Sujet :
Towards Effective Privacy Preserving Decentralized AI

Résumé :

Federated Learning opens interesting perspectives in privacy sensitive domains, such as healthcare or user mobility, that were so far reluctant towards AI and machine learning techniques. Indeed, with such decentralized Federated Learning protocols, data is kept private at the client side, instead of sending it to a remote service/cloud as done in classical approaches. However, Federated Learning unveils a brand new set of challenges. Recent studies show that Federated Learning is vulnerable to malicious users participating to the distributed protocol, if such users perform data poisoning attacks in order to make the AI and global model deviate from its correct behavior [2][7][11]. Malicious users do not rigorously follow the protocol, either innocently, due to human or system errors, or intentionally, due to adversarial behaviors. Such behaviors may end up, for instance, with disease data mislabelling in digital healthcare systems, wrong radiation information in radiation detection systems, or (un)intentionally biased data in open data systems.

The state-of-the-art approaches to tackle malicious clients in classical distributed machine learning make assumptions that do not hold in the case of decentralized Federated Learning systems, such as the fact that clients’ data are identically distributed among clients and independent from each other [10]. However, data present on client devices are collected by the clients themselves, based on thclients’ own usage pattern and local environment. Both the size and the distribution of clients’ data heavily vary between different clients. Thus, there is a need for novel algorithms and techniques to efficiently detect data poisoning attacks and counter them in Federated Learning systems.

The research objective of this PhD project is to derive novel Federated Learning protocols that are resilient to data poisoning attacks. The key tasks of this project are: (i) Exploring different types of data poisoning attacks in Federated Learning, under different use cases, such as disease data mislabelling in digital healthcare systems, or (un)intentionally biased data in open data systems. (ii) Deriving various data poisoning attack implementations (e.g., data label poisoning, data feature poisoning) in real-world datasets, and proposing detection mechanisms based on techniques such as generative adversarial networks (GA Ns) [8], model output and gradient monitoring, etc. (iii) Designing and experimenting a wide range of defense approaches and hybrid protocols, such as software and hardware-based protocols combining decentralized protocols with trusted hardware execution environments such as Intel SGX [9] and ARM TrustZone [1].


Encadrant : Sara Bouchenak
Co-encadrant : Vlad Nitu