Thesis of Bastien Moysset
Subject:
Detection, localization and typing of text in heterogeneous document images with deep neural networks
Defense date: 31/12/2018
Advisor: Christian Wolf
Summary:
This Phd. thesis is about detecting and localizing text lines in highly heterogeneous document databases with complex layouts. For this, we plan to use Machine Learning techniques and, in particular, neural networks. The two main contributions, up to date, are the use of a Recurrent Neural Network within a CTC framework for the segmentation of a paragraph into lines and the adaptation of a convolutional neural networks to directly predict line bounding boxes coordinates within a full document image. One of the main remaining challenge is to take the context of the document into account and to model the interraction between the lines.