Thesis of Corentin Kervadec

Visual recognition and natural language processing through deep learning for scene comprehension


This PhD thesis is at the intersection of three areas of research: artificial intelligence (and especially deep learning), computer vision and automatic language processing. The objective is: 1. to propose neuronal models trained to extract descriptors from both modalities (visual and textual content), and 2. to study how to relate these learned descriptors and to propose a system of visual content search capable of interacting with the user via complex textual queries.


Although significant progress has been made in recent years in this area, the results obtained are nevertheless quite different from human performance, especially when the interaction is via complex queries (for example questions whose answers are not binary). Several scientific obstacles remain to be lifted. First of all, the extraction of relevant descriptors for both modalities remains in itself an open problem, always studied within the deep learning community. Linking and jointly exploiting these descriptors also remains a challenge. Some recent studies have shown that, in the context of a Visual Question Answering (VQA) application, the methods proposed tended to underuse the visual content, and to be limited to making a prediction of the response based on of the question asked. This also raises the problem of evaluating such approaches, which is still an open topic today. Finally, it would also be interesting to study the extension of this type of approach to the case of video, beyond a simple image-by-image analysis. This would make it possible to address new types of requests, which would concern, for example, the temporal location of an event or an inter-object interaction, or the evolution over time of an object in a scene.

Advisor: Christian Wolf

Defense date: thursday, december 9, 2021

Mr Picard DavidProfesseur(e)Ecole des Ponts - ParisTechRapporteur(e)
Mr Thome NicolasProfesseur(e)CNAMRapporteur(e)
Mme SchmidD CordeliaDirecteur(trice) de rechercheInria / DI - ENSExaminateur​(trice)
Mr Teney DamienDocteurIDIAPExaminateur​(trice)
Mme Zeynep AkataProfesseur(e)Université de Tubingen,Examinateur​(trice)
Mr Wolf ChristianProfesseur(e) associé(e)INSA de Lyon / LIRIS CNRS UMR 5205Directeur(trice) de thèse
M Baccouche MoezDocteurOrangeExaminateur​(trice)
Mr Antipov GrigoryDocteurOrangeExaminateur​(trice)