Thesis of Luca Veyrin-Forrer


Subject:
Machine Learning meets Data Mining: Towards sparse and interpretable deep neural networks

Summary:

 The aim of this PhD thesis is to tackle these two ambitious challenges, at the intersection of machine learning and data mining. The young researcher will work on the development of sparse neural networks based on new data mining approaches to analyze, understand and compress the produced models.
− A first task will consist in developing novel approaches for the simplification and compression of DNNs. Compressing models is very important as they can be very large (up to several Gigabytes) and require a great amount of computation that cannot be parallelized (e.g. very deep architectures). However, there is a high redundancy in the computations and model parameters. It has been shown that compression can save 10 to 100 times memory, while keeping almost the same prediction ability. Other approaches drastically reduce the precision of weight parameters, factorize the weight matrices, or perform optimizations on the network structure. Our previous researchwork showed the potential of such optimization techniques for DNNs. Most importantly, compressing the network by removing spurious information can help in its understanding and interpretation, which previous work has mostly tried to tackle by specific visualization techniques or by explicitly learning automatically extracted concepts. Pattern mining algorithms can also be used for analyzing neural activations, by identifying blocks in the weight matrices (or tensors) which represent noise and thus do not contribute to the target and final output activations, and by identifying paths of neuron activations strongly correlated with an output. We will consider neural network architectures that facilitate this mining process, e.g. by preferring
partially-connected structures and sub-modules and by avoiding fully-connected parts as much as possible as they “diffuse” the extracted information across the whole neural network.

− A better understanding of DNNs requires also to work on both the input and the output of the models in several ways:
• Integrating the priors to the models: data mining approaches can be used to characterize non-explicit priors;
• Using the results from data mining or other unsupervised learning techniques as priors;
• Characterizing prediction errors and learning specific models for these “extreme” cases, e.g. geographically localized errors, or to bias training samples to better handle these errors.

− A third task will be to investigate formalisms (i.e. languages) that make
DNNs interpretable or partially interpretable, and the definition of algorithms
that make possible the discovery or learning of DNN descriptions and parame-
terizations with regard to a related language.

 


Advisor: Céline Robardet
Coadvisor: Marc Plantevit
Codirection: Stefan Duffner