Thesis of Benoît Roussel


Subject:
Occlusion Handling in Multi-Object Tracking

Start date: 01/12/2023
End date (estimated): 01/12/2026

Advisor: Liming Chen

Summary:

Occlusions (i.e., when objects are partially or completely obscured by other objects) remain a significant barrier to high

performance in scene understanding tasks. This doctoral research project aims to improve multi-object (e.g., pedestrians and

vehicles) tracking (MOT) models to make them robust to occlusions. Occlusions are challenging because:

(i)Public dataset annotations typically prioritize visible data, which is easier for humans to annotate. This bias in annotation

leads to a scarcity of labeled data that handle occlusions effectively.

(ii)Even when non visible parts of objects are fully annotated, models struggle to directly link hidden elements with visual

patterns, and have to rely heavily on contextual cues from the spatio-temporal surrounding of the element, which

often requires significantly more training data. The same phenomenon arises in 3D detection/tracking, as it typically

necessitates looking beyond pixel-based visual patterns.

To address the above mentioned difficulties, the use of very large datasets with non-human supervision (or limiting it to a few

examples) in training is a promising approach. One way is to exploit the implicit signals present in the spatio-temporal context of

many unlabeled videos, using self-supervised learning. Another is to use synthetic data generated by simulation engines, which

can benefit from having perfect labels (thereby benefiting the aforementioned 3D tasks as well). Both offer the advantage of

being relatively unlimited in dataset size, the first focusing on the quality/realism of the data and the second focusing on the

quality of the labels. By combining the two, the goal is to leverage the large size and high quality of both the data and labels,

thereby enhancing the overall training process and ultimately improving the performance of scene understanding algorithms in

difficult and dense scenarios