### Thesis of Bastien Doignies

**Subject:**

**Start date:**15/09/2021

**End date (estimated):**15/09/2024

**Advisor:**Victor Ostromoukhov

**Summary:**

The goal of the present research project is many-fold.

First, the student will develop a tangible approach to estimate the degree of uniformity of N- dimensional sequences of sampling distributions. This estimation will be used to tailor a reliable measure used in the optimization process in order to obtain sequences of sampling distributions, well-suited for a specific domain of application. Typical applications that we aim to improve are physical processes which happen in medical simulations, namely in particle-tissue or in wave- tissue interactions in the beam therapy (e.g., X-ray or hadron therapy). We expect that machine- learning techniques will greatly improve the performance of the simulation. The goal here is to build a machine-learning based estimation of the discrepancy. The bottleneck lies in the fact that we cannot compute exact values for the discrepancy but rather upper and lower bounds or numerical surrogates (optimal transport-based for example). The goal here is to aggregate all these partial and approximate information on the discrepancy to learn the discrepancy metric. As a starting point towards this challenging goal, a siamese network or, alternatively, a triplet network will provide network baselines to get a first metric estimation.

Second, it is well known that importance sampling strongly impacts the error in stochastic sampling. Importance sampling is efficient when the probability density function (PDF) of the integrand is known or at least precisely estimated. In this part of the project, the student will develop machine learning-based approaches to estimate PDF of the integrand from the sparse available data. This is a challenging task, which may become successful for a variety of physical simulations of the particle-tissue interactions. To do so, the student will build a training set of integrands and corresponding numerical integral values and learn from it, via for example embedding learning with an encoder-decoder or more simply a network. This integral approximation will then be used as a control variate to accelerate the Monte Carlo convergence. While this can look as an easy task, designing the training dataset so as to reliably estimate integrals afterwards is a challenge, in particular the way integrands will be fed to the neural network must be carefully chosen.

Short term objectives would be to review variance reduction techniques from Computer Graphics and adapt importance sampling strategies to estimate specific PDF to medical physics simulations.

Long-term objectives would be to develop ML generative models or frameworks for fast MC integration in the MOCAMED context.