Thesis of Matthieu Rogez
Defense date: 09/06/2015
Advisor: Laure Tougne Rodet
Video-surveillance cameras are increasingly used in our daily lives. They are indeed present almost everywhere in the cities, supermarkets, underground, highways, airports, … . This large number of cameras makes it impossible to put a human operator before each camera in order to detect “abnormal” events, such as an intrusion.
A solution commonly used is to record all video sequences and replay the specific video sequence after a problem has been reported. However, such approach lacks reactivity and therefore isn't suitable for video-surveillance application. An alternative is to automate the detection and recognition of moving objects. Such a system must be able to detect all objects of interest (no omission), and only them (no false detections), while operating in real-time.
The common approach for such systems consist in first segmenting the scene in order to estimate pixels that correspond to objects and then compute several features on them to recognise tracked objects.
Unlike methods that are solely based on the acquired images, this thesis proposes to use the spatial and temporal context of the scene in order to improve object detection and tracking.
We included spatial context by gathering cameras pre-deployment studies data, such as camera GPS locations, and orientations; and by using geographical databases such as OpenStreetMap. Indeed, these data allow us to build a geometrical model of the scene as seen by the cameras, which takes into account nearby fixed obstacles such as buildings, that can impact cameras filed of view. This model allows to reason about the actual position and size of detected objects.
To include temporal context, we used the formalism of state machines, which effectively model the current state of each detected object and the transitions they are allowed to take. This allow us to tailor the processing of each object based on their current state.
In addition, we considered the issue with shadows, which are often erroneously segmented with the object projecting them, by modelling light sources in the scene. For outdoors scene, we use the scene GPS coordinate and current time to predict the position of the sun. This model of shadows within the scene allow us to predict which pixels are likely shadow pixels and therefore help the object segmentation.