Thèse de Kévin Hoarau


Sujet :
Low-resource embedded vision with dynamic neural networks

Date de début : 02/10/2025
Date de fin (estimée) : 02/10/2028

Encadrant : Stefan Duffner

Résumé :

Cameras are becoming increasingly present in edge devices and embedded systems where the processing and interpretation of the signals is performed in the device or close to it and where computational resources are scarce. Nowadays, more and more ML models are integrated into these devices and these models are becoming more powerful but also more energy consuming.
State-of-the-art computer vision models are based on deep neural networks, and many techniques have been proposed to reduce their complexity, both in computation and in memory, thus reducing their energy consumption – for example, pruning, quantization, knowledge distillation or low-rank approximations.
Another approach is to dynamically adapt the complexity of the model during run time depending on internal factors, such as the complexity of the input, or on external sources, such as the battery charge or the temperature. This leads to dynamic neural networks that modulate the depth or width of their architecture, or activate different computation paths. 
The aim of this PhD thesis is to investigate the potential of these dynamic neural network models for computer vision in low-power settings and with event-based cameras. 
Event-based cameras, also known as neuromorphic cameras or dynamic vision sensors, are a type of camera that capture visual information in a fundamentally different way than traditional cameras. Instead of capturing a sequence of static frames, event-based cameras detect and record changes in the scene, such as motion or brightness changes, as a stream of asynchronous events.
Traditional cameras capture images at a fixed frame rate, typically 30 or 60 frames per second, which can result in a significant amount of redundant information, especially in static scenes. In contrast, event-based cameras only capture and transmit the changes in the scene, which can lead to several advantages, including: lower latency, higher dynamic range, improved motion detection, reduced data rate and increased power efficiency. 
The objective of this thesis is to develop new dynamic neural network architectures that are suitable for the processing of data streams from an event-based camera. These data are dynamic with respect to space and time, as only moving pixels are transmitted. The neural network should thus only process the spatial regions that are relevant and only at the instants where there is motion, and it should further modulate its size depending on the complexity or difficulty of this data stream. In this way, the full processing power is employed only when needed, and energy is saved otherwise. Several approaches have been proposed in the literature for CNNs but not in the context of event-based cameras.
The thesis will focus on a given computer vision task that will be defined in the beginning, for example image classification, object detection or action recognition. We will also specify the data preprocessing pipeline and baseline models for comparison. 
Then a dynamic neural network, e.g. based on early exits or online pruning, and CNN, ViT or SSM, will be implemented and evaluated on a defined benchmark, in terms of accuracy and computational complexity or energy consumption. 
Finally, the developed algorithms may also be tested on recorded event streams or even in real time on our Prophesee event-based camera.