Thesis of Salma Moujtahid


Subject:
Online object tracking and recognition using a moving camera

Defense date: 03/11/2016

Advisor: Atilla Baskurt
Coadvisor: Stefan Duffner

Summary:

The aim of this thesis is to develop new efficient and robust models to track and recognize objects and individuals in videos captured from moving cameras. Classical methods of background subtraction usually fail with this type of videos; in fact more discriminative methods of object detection are needed. To overcome this, machine learning
methods - for example based on Adaboost or Neural Networks - can be used to adapt to the non-static constraints while processing in real time. For object tracking, many approaches of that type have been proposed: Online Adaboost [11], SVM [1] based methods, Multiple Instance Learning (MIL) [2], or Random Forests [9]. But these have mainly been evaluated on static and short term videos (1, 2 minutes long at the most), while being greedy in terms of processing time. In the case of long term videos with dynamic scenes, the models have to be learned and adapted as the video is processed, since the objects’ appearance and form can change considerably. The adaptation or incremental learning problem has been approached by numerous papers (for example [11, 1, 2, 13]), but the difficulties subsist due to the divergence of the model over time. Therefore, the learned models will have to robustly (re)identify the objects or scene parts that (re) appear over time. In this thesis, neural and deep learning based approaches will be explored; which hasn’t been done in this context before. These methods have already proved their efficiency in static and generic environments (for example in face
detection and recognition), and applying them in a real-time and dynamic context represents a real challenge and novelty compared to the field’s state-of-the-art.
The methods developed during this thesis will be tested under real conditions on moving cameras’ videos, captured from a mobile phone camera or a mobile robot. An evaluation on international benchmark datasets will enable us to compare these methods to the state-of-the-art.
There are many possible applications. For example, in the Human Robot Interaction field, an “intelligent” visual system would allow a robot to find its way through a complex and unknown environment, and to recognize surrounding objects or individuals with whom it can interact (like helping a person who is visually impaired or having another handicap).
The aspects of Human-Machine Interaction (HMI) and system adaptation to the environment could be treated at a higher level in further steps. For example in the context of Intelligent Tutoring Systems (ITS), or Computer Supported Cooperative Work (CSCW) in collaboration with the SILEX team of the LIRIS laboratory.