Thèse de Télio Dupuis

Thesis of Télio Dupuis

Subject:

Deep, self-supervised, and active sensorimotor learning of representations of manipulable objects

Start date: 20/05/2026
End date (estimated): 20/05/2029

Advisor: Frédéric Armetta

Summary:

This research focuses on learning representations from sequences of interactions with the environment. In particular, we will draw on the theory of sensorimotor contingencies, such that action structures not only the learned representations but also the dynamics of the interaction. Within this framework, we aim to learn predictive structures of the world, enabling us to define objects in a self-supervised manner as graphs of potential interactions. During the thesis, the following research questions will be addressed:
- How to integrate action into existing self-supervised deep learning models (e.g., Transformers or State Space Models) and how this influences the model’s structures and predictive capabilities.
- How to learn spatiotemporal structures that may correspond to notions of proto-objects. Hybrid approaches combining graphs and deep learning will be studied, particularly to learn multiscale structures that are locally organized and globally connected. These representations may also serve as a supervision signal for the self-supervised approaches used in the multimodal learning conducted in another part of the project.

- How to develop methods that are efficient in terms of training time and data usage. Indeed, using actions requires a simulator, which results in longer computation times than using a database. The possibility of performing offline pre-training (for example, with pre-recorded random behaviors) will be explored. In addition, active learning mechanisms (involving the selection of timely actions to obtain useful information) will be proposed to reduce the amount of training data required to achieve a certain level of performance. By formalizing testable hypotheses about the environment, these mechanisms will also reduce the size of representations (by retaining only the predictable subsets of inputs). This research may also be combined with policy selection mechanisms explored in another part of the project.

These different approaches will be tested in simple environments (as we did with Tetris) or in a robotic simulation environment with objects that have simple shapes and properties (in conjunction with other research being conducted as part of the MeSMRise project).