Thesis of Thibault Douzon
Subject:
Defense date: 24/10/2023
Advisor: Christophe Garcia
Coadvisor: Stefan Duffner
Summary:
Every day, countless volumes of documents are received and processed in businesses worldwide.
This thesis focuses on automating the extraction of information from these corporate documents using machine learning models.
Transformers, with their self-supervised pre-training, demonstrate remarkable accuracy in document comprehension. Moreover, they outperform recurrent networks in information extraction through word classification, requiring less training data. Specific pre-training tasks tailored to corporate documents further enhance model performance, even with smaller models. Finally, efficient transformer-derived architectures reduce the evaluation cost for long sequences, enabling the processing of sequences composed of different modalities.
Jury:
Mme Lemaitre Aurélie | Maître de conférence | Université Rennes 2 | Rapporteur(e) |
M. Paquet Thierry | Professeur(e) | Université de Rouen Normandie | Rapporteur(e) |
M. Tabbone Salvatore-Antoine | Professeur(e) | Université de Lorraine | Examinateur(trice) |
M. Ogier Jean-Marc | Professeur(e) | La Rochelle Université | Examinateur(trice) |
M. Garcia Christophe | Directeur(trice) de recherche | LIRIS INSA Lyon | Directeur(trice) de thèse |
M. Duffner Stefan | Maître de conférence | LIRIS INSA Lyon | Co-directeur (trice) |
M. Espinas Jérémy | Docteur | Esker | Co-encadrant(e) |
M. Bérard Jean-Jacques | Directeur(trice) de recherche | Esker | Invité(e) |