Thesis of Nathan Quiblier


Subject:
Approximate Bayesian Computation for multimodal microscopy data: exploring the mobility of transcription factors in the nucleus of living cells

Defense date: 17/12/2024

Advisor: Hugues Berry

Summary:

Understanding how transcription factors explore the nucleus to find their binding sites is still a bottleneck in our comprehension of gene expression. One example is transcriptional pause release, a crucial step in transcription, the generation of complementary RNA from DNA by polymerase II (Pol II). Upon engagement of the transcription process, half of the Pol II molecules remains blocked at pause sites, thus effectively pausing transcription. The critical factor that triggers the release of this pause is P-TEFb (for Positive Transcription Elongation Factor b), that  phosphorylates Pol II, thus unleashing it from the pause status. However, how exactly P-TEFb finds Pol II molecules in the nucleus is not fully understood. What factors shape the motion of P-TEFb: molecular crowding and its spatial distribution? binding to nonspecific DNA sites? binding to protein partners? or other mechanisms?

Current experimental approaches are hampered by their limited spatiotemporal scale and the complexity of nuclear organization. We propose to overcome this challenge by combining computer simulations with microscopy. The experimental partners in the consortium (I. Izeddin Paris; L. Héliot Lille; X. Darzacq Berkeley USA) are developing experimental approaches based on super-resolution microscopy to map the spatiotemporal dynamics of P-TEFb in the nucleus of living cells. The modelling partner (H. Berry, Inria Lyon) is augmenting them with a simulation approach based on Approximate Bayesian Computation (ABC) with Monte-Carlo simulations of P-TEFb diffusion in increasingly realistic computer models of the nucleus.

The PhD research project will be at the inception of the modelling effort of the project and initiate the development of the Monte-Carlo-based ABC code (3d Brownian motion and related models, modeling of the microscopy signals). The code will also allow for model selection, i.e. to produce estimates of what is the microscopic model likely to be at play in the cell. A key aspect of the project is code efficiency, since ABC typically needs millions of simulations. For this reason, the simulation code will be developed in C++, on the basis of an existing prototype developed in the lab. To accelerate execution, a large effort will be devoted to optimization and parallelization for execution on a computer cluster, including on new multi/many cores and GPU processors. 

In parallel, the PhD student will develop mathematical tools to predict the coarse-grain behavior of the implemented microscopical models. To that end, the mean-field limit of these models will be studied, for instance using a formalism based on age-structured PDEs.