Thesis of Rui Yang

Continuous few-shot learning

Start date: 01/10/2020
End date (estimated): 01/10/2023

Advisor: Liming Chen


Current research and huge progress in AI has been much triggered and empowered by the paradigm of  deep learning (DL) and has dramatically improved the state of the art in a number of increasing fields,  e.g​., computer vision, speech recognition [​LeBeHi@Nature 2016​]. However, current state of the art  DL-based solutions are notoriously ​data-hungry systems​, and generally require huge amount of labeled  data for training, which, unfortunately, is very expensive and impractical in numerous applications and  constitutes a roadblock for its adoption in a much wider further domains.   The research question we will investigate in this thesis is how we can still effectively train a DNN despite  lack of labelled data, using for instance only a few labeled data, coined as few shot learning or one shot  learning [​Lake et al.@CogSci2011​], or even zero labeled data named as zero shot learning [​Norouzi et  al.,2013​].   State of the art   The current research to mitigate the data famine issue of DNN ranges from transfer learning to data  simulation and few-shot learning. Generally speaking, ​transfer learning aims to transfer knowledge from a  source domain where labeled data are abundant to a target domain where labeled data are scarce or even  unavailable [​Lu et al.2019@SubComputingSurvey​]. However, it is well known that the effectiveness of  knowledge transfer depends much upon the degree of relatedness between the source and target task, and  it is not always possible to find out a related source task with abundant labeled data given a target task at  hand. As such, data simulation for deep learning is much in vogue currently, ​e.g.​, Habitat-Sim by 1 Facebook AI Research, HOME by Google, Deepmind lab by Deepmind, MS AirSim by Microsoft. My 2 3 4 

1 ​

2 ​

3 ​

4 ​ group also released for robotic grasping the Jacquard dataset , a simulated dataset with more than 11K

5 simulated objects, 54K different scenes, and 5M grasp locations.

The notorious shortcoming of learning  through simulation is the reality gap between simulated data and real data, making a learned model  inefficace in the presence of real data.   With the goal to develop learning methods with human like skills of learning from a few samples,  few-shot learning has received increasing attention in the research communities. The current research  features two main approaches, metric-learning based, ​e.g.​, matching networks [​Vinyals et  al.@NIPS2016​], prototypical networks [​Snell et al.@NIPS2017​], relation networks [​Sung et  al.@CVPR2018​], or meta-learning oriented, ​e.g.​, MAML [​Finn et al.@ICML2017​]. In our own work, we  developed ​von Mises-Fisher Mixture Model-based Deep learning for the embedding of the input images  into unit-length directional features and applied it to face recognition. Meta-learning oriented approaches  learn either optimal initial parameters, ​e.g.​, MAML [​Finn et al.@ICML2017​], which enables maximal  performance after a few gradient steps given a novel task, or even a gradient descent update rule  [​Andrychowicz et al.@NIPS2016​, ​Ravi&Larochelle@ICLR2017​]. The state of the art has also featured  two standard benchmarks for few-shot learning, namely Omniglot [​Lake et al.@CogSci2011​] and  MiniImageNet [​Ravi&Larochelle@ICLR2017​]. All these approaches open promising avenues for  few-shot learning. However, while dealing with effective learning under small data regime for a given  task, they also introduce a big task regime where a large collection of tasks need to be sampled for  building training episodes. Furthermore, current approaches show high performance on benchmarks, ​e.g.​,  Omniglot, where learning and testing tasks are much related, but record severe performance drop when  they are much less related as in MiniImageNet. In addition, none of these approaches considers the most  real-life application settings where data for a given task arrives one after another in streams in a non  stationary way and they need to be leveraged to improve the meta-learning model and the learner model  over time.  Methodology   We aim to develop methods and techniques for few-shot learning but in a lifelong continuous learning  setting. This means that we want the machine to have human like ability to learn quickly through a few  samples but also to continually acquire, fine-tune, and transfer knowledge and skills throughout its  lifespan [​Parisi et al.@JNN2019​].   For this purpose, we propose to explore ​developmental learning [​Petit et al.@IEEE_TCDS2016​] and  leverage three building blocks, namely simulation, transfer learning and meta-learning. Specifically, in  line with the theory of complementary learning systems [​Kumaran et al.2016​], we organise the knowledge  of a machine into a bio-inspired autobiological memory which comprises an episodic memory for online  data, semantic memory for learned tasks and procedural memory for learned models. We perform  few-shot learning through meta-learning, ​e.g.​, ​MAML​, and simulation which enables the sampling of  learning tasks, ​e.g.​, ​Jacquard for robotic grasping or ​IVI-GAN for faces, as many as we want.

The  resultant meta-learner model is stored into the procedural memory. Transfer learning is performed when a  novel task is instantiated: the learner models of related tasks are retrieved from the procedural memory  5 ​  and combined with the meta-learner’s model to yield a novel initial learner model. Using online real data  stored into the episodic memory, the learner’s model of a given task can be continuously fine-tuned. A  key research question here is the stability/plasticity dilemma where plasticity measures the ability of a  learning system to acquire new knowledge and refine existing knowledge on the continuous input  whereas stability is to prevent the novel input from significantly interfering with existing knowledge. This  trade-off between plasticity and stability could be achieved in distinguishing meta-learner’s model for  stability from learners’ models for plasticity.   The tentative schedule of this PhD would be, after a thorough study of the state-of-the-art, to consider as a  starting point of this research line our recent developmental learning framework applied to grasping  robots [​Petit et al.@IEEE_ICDL-Epirob2018​]. Based on this approach, we then plan to go deeper into the  related directions namely simulation, transfer learning and meta-learning.    

Main publications of the supervisors related to the  subject  Ying Lu, Lingkun Luo, Di Huang, Yunhong Wang, Liming Chen, “Knowledge Transfer in Vision  Recognition: A Survey”, ACM Computing Surveys, vol. 53, pp. 1-35, 2020.   Yuxing Tang, Josiah Wang, Xiaofang Wang, Boyang Gao, Emmanuel Dellandréa, Robert Gaizauskas,  Liming Chen, "Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object  Detection", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40(12), pp. 3045-3058,  2018.   Ying Lu, Liming Chen, Alexandre Saidi, Emmanuel Dellandréa, Yunhong Wang, "Discriminative  Transfer Learning Using Similarities and Dissimilarities", IEEE Transactions on Neural Networks and  Learning Systems, vol. 29(7), pp. 3097-3110, 2018.   Maxime Petit, Amaury Depierre, Xiaofang Wang, Emmanuel Dellandréa, Liming Chen, "Developmental  Bayesian Optimization of Black-Box with Visual Similarity-Based Transfer Learning", IEEE  International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2018.