Thesis of Hatem Mousselly-Sergieh
Subject:
Defense date: 26/09/2014
Advisor: Jean-Marie Pinon
Coadvisor: Elod Egyed-Zsigmond
Summary:
Web 2.0 technology gave rise to a wide range of platforms for sharing images. People became able to upload their images and to collaboratively annotate them with keywords, called tags, to allow efficient management and retrieval. However, manual tagging is laborious and time consuming. In recent years, the shear amounts of user-tagged photos accessible on the Web attracted researchers' attention to develop methods for automatic image annotation. In contrast to traditional approaches, which relay on statistical modeling and machine learning, the new research direction, called search-based image annotation, showed better performance and scalability. The idea is to identify for an unlabeled photo, a set of visually similar images in a pool of already tagged images using computer vision algorithms. Subsequently, tags of the similar images are propagated as annotations for the unannotated image.
Currently, a considerable number of community photos are associated with location information, i.e., geotagged. In this thesis, we exploit this rich context and present algorithms and statistical models for automatic image annotation by following the search-based paradigm. The objective is to address the main limitations of state-of-the-art approaches in terms of the quality of the produced tags and the speed of the complete annotation process. First, we present a method for collecting image data from the Web based on location information. Thereby, we place emphasis on the quality and the spatial representativeness of the data. To ensure tag quality, a novel approach for resolving the inherent ambiguity of user tags is proposed. The approach demonstrates the effectiveness of the Laplacian feature selection algorithm for improving tag representation. Furthermore, it presents a new distance measure for tag relatedness by extending the well-known Jensen-Shannon Divergence to account for statistical fluctuations. To identify similar images for an unannotated photo, we also exploit location information to narrow the search space. Additionally, the thesis investigates methods for improving the accuracy and reducing the run-time of image matching. Correspondingly, a solution for speeding up image matching based on the Speeded Up Robust Features (SURF) is presented. The approach introduces a model for identifying salient SURF keypoints using classification techniques. Consequently, the performance can be significantly improved by restricting the matching to reduced subsets of salient keypoints. Furthermore, a method for improving the accuracy of SURF based on iterative matching and an efficient keypoint clustering algorithm is proposed. Furthermore, the thesis presents a statistical model based on Bayes' rule for ranking the mined annotations according to visual, textual and user-related information. Finally, the effectiveness of the proposed annotation approach as a whole and the efficiency of the individual contributions are demonstrated experimentally through comprehensive evaluation studies.