Thesis of Assem Sadek
Start date: 25/02/2021
End date: 25/02/2024
Advisor: Christian Wolf
The context of this PhD thesis is research on virtual and real agents able to solve tasks autonomously in complex environments. It is a part of the AI Chair proposal REMEMBER, and targets intelligent agents that require high-level capabilities for navigation, reasoning and making the right decisions. The required behavioral policies are complex, they involve high-dimensional input spaces (images, videos, inertial sensors) and high-level and structured output in the form of navigational and reasoning decisions.
Learning these behavioral policies crucially depends on the capacity of learning structured and semantically meaningful representations of the dynamic environment. A key requirement is the ability to learn these representations with a minimal amount of human interventions and annotations, as we can only afford a manual design and annotation of a limited number of complex representations. This requires an efficient exploration of raw input data by means of supervised, unsupervised or self-supervised learning.
The project is aimed at moving forward the situation awareness and reasoning capabilities of real world robots by advancing methods applicable in real-world scenarios. The objective here is to augment the proven line of research based on geometrical mapping and planning, with machine learning in order to address problems where conventional geometric models are insufficient. The methods will be based on and include combinations of geometric and semantic mapping + planning, sample efficient learning, and synthetic to real transfer.
Our methodology is to learn rich and structured representations of an agent’s environment, which allows accurate localization and enables efficient navigation and planning. In order to increase sample efficiency and generalization to real world scenarios, we plan to learn these representations in a combination of unsupervised, self-supervised and supervised learning. A large emphasis will be put on learning depth, ego-motion and structure largely without labeled annotations, and in a tight integration with planning. In particular, we plan to explore metric representations with Bayesian techniques, which are able to model uncertainty in a principled way, and to combine them with high-capacity Deep Neural Networks, which can create high level semantic predictions from low level sensor data (RGB, stereo, LIDAR, inertial sensors, WIFI, etc.).
- Deep learning and 3D geometry
- Sim2Real Transfer
- End-to-end mapping and planning