Natural Language Processing with Large-Scale Integrated Classifiers

Séminaire du LIRIS par Prof. Yutaka SASAKI, Toyota Technological Institute, Nagoya, Japan

On 02/09/2013 at 10:30 to 12:00. Salle C4, bâtiment Nautibus, Université Lyon 1
Informations contact : S. Servigne et G. Damiand. +33 (0)

In this talk, I am going to present a hierarchical text classification system that utilizes more than 5,000 integrated classifiers. Last year, our team participated in the Third PASCAL Hierarchical Text Classification Challenge. The task was to classify 81,262 Wikipedia documents into 50,312 hierarchical Wikipedia categories based on 456,866 training data. This a really computationally challenging task which requires a lot of algorithmic improvements to solve this problem in practical time and memory space. Our system was ranked in top three among 17 participated systems. In our future plan, we accelerate the training and classification speeds with massively parallel processing, such as GPGPU.