Thesis of Bonan Cuan

Deep Similarity Metric Learning for Multiple Object Tracking


Multiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects need to be accurately detected from background and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects in all frames are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. 


Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. In this thesis, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: 


First, we reviewed the state of the art of object re-identification, realizing intra-category recognition which is required in multiple object tracking. Specifically, we thoroughly investigated the application of pairwise metric learning in this field. A deep Siamese network is employed to learn a proper end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are experimented and compared, in order to determine the best re-identification model for tracking. With an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet loss when evaluated on benchmarks like CUHK03. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. 


Second, we integrated our proposed re-identification model in data association as appearance guidance for multiple object tracking. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification, which determines the tracking result along with motion models. Besides, we also investigated the collaboration and interference between appearance and motion models. Contrary to most existing tracking algorithms that bind both kind of models via a simple sum of their scores, we propose an online model coupling to further improve the tracking performance. When a model fails in front of ambiguous tracks, the other takes over the data association. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy. 

Advisor: Khalid Idrissi
Coadvisor: Christophe Garcia

Defense date: thursday, september 12, 2019

M. CHATEAU ThierryProfesseur(e)Université de Clermont-AuvergneExaminateur​(trice)
Mme CAPLIER AliceProfesseur(e)Université de GrenobleExaminateur​(trice)
M. PAINDAVOINE MichelProfesseur(e)Université de BourgognePrésident(e)
M. GARCIA ChristopheProfesseur(e)INSA LyonCo-directeur (trice)
M. IDRISSI KhalidMaître de conférenceINSA LyonDirecteur(trice) de thèse