Thesis of Dimitri Gominski
Subject:
Defense date: 09/11/2021
Advisor: Liming Chen
Coadvisor: Mohsen Ardabilian
Summary:
With an ever increasing volume of digitally accessible images, establishing connections to organize and analyse data is all the more important. A typical formulation for connecting images without using metadata is content-based image retrieval (CBIR). Similarly to other applications in computer vision, CBIR has benefited from the expressivity of convolutional neural networks (CNN) and obtained unprecedented results on usual benchmarks. However, it is hard to say whether this performance is explained by the proposal of more and more sophisticated architectures and models, or simply by the presence of a training dataset that matches the use case, i.e. that has similar visual and semantic characteristics. Indeed, the usual paradigm of the model-training dataset couple shows its limits as soon as one leaves the case characterized by the training data: the performance drops when the model is tested on different data, or data with too high variability.
This thesis addresses this issue with a critical look at deep learning methods and their real application potential. In a context of multi-source geographical imagery, a benchmark is proposed to characterize a new research problem: heterogeneous image retrieval, "low-data" (without training data), with a use case where defining a training dataset and a baseline method is not easy: the interconnection of iconographic collections from different heritage institutions. With this benchmark, new measures are proposed to qualify the generalization ability of the model in a CBIR context, then technical solutions that allow to get rid of the hazardous definition of similar visual and semantic characteristics. The discussion around the results highlights a probably too great importance given to the architecture of neural networks, and promising ideas in CBIR which provides tools agnostics of the used model, and allowing to exploit the comparative advantages of different models trained on different data sets. Finally, the interest of this generalist approach is confirmed by a second application to land-use classification with high-resolution satellite imagery, a case where despite the abundance of methods and data, they are encapsulated in a set of small datasets and therefore with a limited application potential.
Jury:
Mr Bell Peter | Professeur(e) | Friedrich-Alexander Universität | |
Mr Erlangen-Nürnberg | Professeur(e) | Allemagne | Rapporteur(e) |
Mr Joly Philippe | Maître de conférence | Université Paul Sabatier, Toulouse | Rapporteur(e) |
Mme Stoter Jantien | Professeur(e) | Delft University of Technology, Pays-Bas | Président(e) |
Mr Samaras Dimitris | Professeur(e) | Stony Brook University, Etats-Unis | Examinateur(trice) |
Mme Gouet-Brunet Valérie | Directeur(trice) de recherche | Université Gustave Eiffel | Co-directeur (trice) |
Mr Chen Liming | Professeur(e) | Ecole Centrale de Lyon | Co-directeur (trice) |