Thesis of Mingyuan Jiu

Subject:

Spatial information and end-to-end learning for visual recognition

Start date: 01/09/2010
Defense date: 03/04/2014

Advisor: Atilla Baskurt
Coadvisor: Christian Wolf

Summary:

In this thesis, we present our research on visual recognition and machine learning. Two
types of visual recognition problems are investigated: action recognition and human body
part segmentation problem. Our objective is to combine spatial information such as label
configuration in feature space, or spatial layout of labels into an end-to-end framework
to improve recognition performance.
For human action recognition, we apply the bag-of-words model and reformulate it
as a neural network for end-to-end learning. We propose two algorithms to make use of
label configuration in feature space to optimize the codebook. One is based on classical
error backpropagation. The codewords are adjusted by using gradient descent algorithm.
The other is based on cluster reassignments, where the cluster labels are reassigned for
all the feature vectors in a Voronoi diagram. As a result, the
codebook is learned in a supervised way. We demonstrate the effectiveness of the
proposed algorithms on the standard KTH human action dataset.
For human body part segmentation, we treat the segmentation problem as
classification problem, where a classifier acts on each pixel. Two machine
learning frameworks are adopted: randomized decision forests and convolutional neural
networks. We integrate \textit{a priori} information on the spatial part layout in terms
of pairs of labels or pairs of pixels into both frameworks in the training procedure to
make the classifier more discriminative, but pixelwise classification is still performed
in the testing stage. Three algorithms are proposed:
(i) Spatial part layout is integrated into randomized decision forest training procedure;
(ii) Spatial pre-training is proposed for the feature learning in the ConvNets; (iii)
Spatial learning is proposed in the logistical regression (LR) or multilayer perceptron
(MLP) for classification.