Thesis of Yongzhe Yan


Subject:
Face Analysis for Aesthetic Augmented Reality Applications

Defense date: 19/06/2020

Advisor: Christophe Garcia
Coadvisor: Stefan Duffner

Summary:

Precise and robust facial component detection is of great importance for the good user experience in aesthetic augmented reality applications such as virtual make-up and virtual hair dying. In this context, this thesis addresses the problem of facial component detection via facial landmark detection and face parsing. The scope of this thesis is limited to deep learning-based models.The first part of this thesis addresses the problem of facial landmark detection. In this direction, we propose three contributions. For the first contribution, we aim at improving the precision of the detection. To improve the precision to pixel-level, we propose a coarse-to-fine framework which leverages the detail information on the low-level feature maps. We train different stages with different loss functions, among which we propose a boundary-aware loss that forces the predicted landmarks to stay on the boundary. For the second contribution in facial landmark detection, we improve the robustness of facial landmark detection. We propose 2D Wasserstein loss to integrate additional geometric information during training. Moreover, we propose several modifications to the conventional evaluation metrics for model robustness.To provide a new perspective for facial landmark detection, we present a third contribution on exploring a novel tool to illustrate the relationship between the facial landmarks. We study the Canonical Correlation Analysis (CCA) of the landmark coordinates. Two applications are introduced based on this tool: (1) the interpretation of different facial landmark detection models (2) a novel weakly-supervised learning method that allows to considerably reduce the manual effort for dense landmark annotation.The second part of this thesis tackles the problem of face parsing. We present two contributions in this part. For the first contribution, we present a framework for hair segmentation with a shape prior to enhance the robustness against the cluttered background. Additionally, we propose a spatial attention module attached to this framework, to improve the output of the hair boundary. For the second contribution in this part, we present a fast face parsing framework for mobile phones, which leverages temporal consistency to yield a more robust output mask. The implementation of this framework runs in real-time on an iPhone X.