Thesis of Lisa Chabrier


Subject:
EFFICIENT APPROXIMATION METHOD FOR LOCAL EXPLANATION OF MACHINE LEARNING MODELS, APPLIED TO THE INFERENCE OF LOCAL ACTIVITY OF GENE REGULATORY NETWORKS

Start date: 01/10/2021
End date (estimated): 01/10/2024

Advisor: Christophe Rigotti
Coadvisor: Sergio Peignier
Codirection: Anton Crombach

Summary:

The work presented in this thesis is divided between algorithm design for machine learning explainability and single-cell RNA-Seq data analysis. We chose to work on a local explainability method for machine learning models based on the 'SHapley Additive exPlanation' (SHAP) framework, which quantifies the importance of the features of a prediction with scores called SHAP values. This method only applies the model on inputs and therefore is compatible with any type of predictive model: it is model-agnostic. The calculation of SHAP values presents a significant challenge, as its cost is exponential with respect to the number of features. In our application context, only a subset of the model features is important. Therefore, we strategically directed the computational resources towards computing the SHAP values for the top-k most important features. To this end, we designed and implemented TopShap, an iterative algorithm that interleaves refinements of the SHAP value approximation with pruning steps to discard elements that can no longer be in the top-k. We showed that TopShap is faster than post-processing the output of the fastest model-agnostic approximation method: Kernel SHAP. Next, we used TopShap to study GRN rewiring events. This led to the design of a workflow named re_actShap, which was then extended and applied to cancer data provided by collaborators at the Centre de Recherche en Cancérologie de Lyon (CRCL).