Thesis of Yan Liu


Subject:
Extending the bag-of-words to a large number of categories

Start date: 01/02/2007
End date (estimated): 01/02/2010

Advisor: Liming Chen

Summary:

Image categorization is the pattern recognition problem which consists in assigning one or multiple labels to an image based on its semantic content. The so called “bag-of-words” approaches, inspired by the bag-of-words used for text categorization, have shown state-of-the-art performance. In the same manner it is possible to encode a textual document by counting the number of occurrences of each word; it is also possible to characterize an image by counting the number of occurrences of each visual word. This visual vocabulary, which has to be learned automatically from a training set, helps to bridge the semantic gap between the low-level features extracted from images (based on color, texture, shape, etc.) and the high-level concepts to be recognized. These histograms are then used as the input of discriminative classifiers (typically, one per category).

The focus of this thesis will thus be to extend the “bag-of-words” approach to a large number of categories (from several hundreds to several thousands). The solution that we propose to explore during this thesis will be based on hierarchical vocabularies. At the root of this hierarchy, a very coarse vocabulary will make it possible to recognize very general categories whereas leaves will contain specialized vocabularies which will enable to recognize very fine concepts.

My Ph.D study is under the co-supervision of professor Liming Chen and Florent Perronnin (Xerox Research Centre Europe).