Thesis of Khadijetou Cheikh Mohamed Fadel
Start date: 03/09/2021
End date: 03/09/2024
Advisor: Véronique Eglin
The aim of this PhD is to propose a theorical framework for dynamic information retrieval, based on multimodal (text & image) queries creating interactions between the two modalities (text & image) and based on an efficient query reformulation and relevance feedback techniques. In the thesis, a special focus should be made to ensure coherent fusion strategy to take full advantage of information collected from multiple sources, providing a satisfying description of the query.
It is a challenging task to implement such an interactive and multimodal query plateform requiring an efficient langage model to figure out document image content :
The experiments will focus on a corpus of historical documents images where vocabularies are not in our modern lexicons and require a specific expertise. There are also a large number of difficulties related to the quality of the sources (quality of printing, editing, variability of the writers...) which requires an important preliminary work before extraction (especially for the construction of annotated data sets based on real data).
The embedding representation spaces combining texts and images (photographs, engravings, and images of the texts) do not yet exist. The fusion of representation completing the text with images for a same concept is a major lock. It will be solved by the development of new information representation models and new frameworks for document infomration retrieval based on machine learning techniques, combining language models from NLP with visual recognition of image content.
Multimodal interrogation tools for accessing historical documents sources are still in their beginnings. There are only solutions that allow text and image to be processed separately. It is essential to provide access supporting both text and image query modes and ensuring an exploration of these sources under their various semantics.