Thesis of Clément Sage
Subject:
Defense date: 01/10/2021
Advisor: Alexandre Aussem
Coadvisor: Véronique Eglin, Haytham Elghazel
Summary:
This thesis deals with the information extraction in business documents which are scanned or born-digital and possibly multilingual. Efficiently extracting information from documents issued by their partners is crucial for companies that face huge daily document flows. Yet, automating information extraction from business documents is challenging due to the semi-structured nature of these documents, i.e. the fact that an instance of a specified document class such as invoice or purchase order mandatorily contains a predefined set of information to retrieve but the positioning and textual representation of the information are unconstrained.
Inspired by works within the Natural Language Processing (NLP) community and particularly about named entity recognition, this thesis proposes several approaches based on recurrent neural networks (RNNs) that iterate over document words retrieved by an Optical Character Recognition (OCR) engine.
Jury:
Mr Doucet Antoine | Professeur(e) | Université La Rochelle | Rapporteur(e) |
Mme Lemaitre Aurélie | Professeur(e) associé(e) | Université Rennes 2 | Rapporteur(e) |
Mme Belaïd Yolande | Maître de conférence | Université de Lorraine | Examinateur(trice) |
Mme Faci Noura | Professeur(e) associé(e) | Université Claude Bernard Lyon 1 | Examinateur(trice) |
Mr Paquet Thierry | Professeur(e) | Université de Rouen et Normandie | Président(e) |
Mr Aussem Alexandre | Professeur(e) | Université Claude Bernard Lyon 1 | Directeur(trice) de thèse |
Mme Eglin Véronique | Professeur(e) | INSA Lyon | Co-directeur (trice) |
Mr Elghazel Haytham | Maître de conférence | Université Claude Bernard Lyon 1 | Co-directeur (trice) |
Mr Bérard Jean-Jacques | Directeur(trice) de recherche | Société Esker | Invité(e) |
Mr Espinas Jérémy | Chercheur | Responsable industriel, Société Esker | Invité(e) |