Deep Learning for Computer Vision

Scientific goals

The goal is to automatically learn invariant and discriminative hierarchical representations for various applications from labelled and unlabelled data. A particular emphasis is put on integration structural information into learning machines through structured output learning or by structuring the prediction model itself.

Activity recognition

Deep attention models for activity recognition. Work of Fabien Bardel.


Unsupervised learning of invariant models.

Papers: More information:

Gesture recognition and pose estimation

We are working on learning deep representations for gesture recognition, taking into account appearance and temporal (motion) aspects.

Human pose estimation (full body pose or hand pose) often passes through an intermedia representation of the body into semantic parts. Deep learning is particularly well suited for this task.

More information: Papers:

Identity from motion

We are working on Biometrics from motion using inertial sensors and gyroscopes in smartphones.


Semantic full Scene labelling

Our work on semantic full scene labelling aims at modelling spatial context extracted from labelled ground truth. A cascade of learning machines (similar to auto-context models) is trained to produce a segmentation map, where subsequent classifiers learn context based on outputs of previous classifiers.


Full scene labelling of first person videos and eyetrackers.


Christian Wolf
INSA-Lyon / Université de Lyon