Thesis of Arthur Aubret
Subject:
Defense date: 30/11/2021
Advisor: Salima Hassas
Coadvisor: Laetitia Matignon
Summary:
In reinforcement learning, an agent learns by trials and errors to maximize the expectation of rewards received while acting in his environment. In a multi-agent scenario, some tasks imply that multiple agents have to cooperate ; yet, despite novel advances in deep reinforcement learning, it is known to be difficult to coordinate the agents, particularly when the number of agents is growing. Communication may be an efficient way to coordinate agents, however actual models only includes observations into the communication bewteen agents and consider scenarios with few agents. To adress those issues, we want to take advantage of recent works on intrinsic motivation. At first, we want our agents to be able to communicate high-level information, for instance their intentions in addition to their observations, to improve their coordination. To do so, they have to learn a representation of their skills. As a second step, our goal is that our agents learn to choose what to communicate, when and to whom.
Jury:
Mr Dutech Alain | Professeur(e) | Rapporteur(e) | |
Mr Filliat David | Professeur(e) | Rapporteur(e) | |
Mr Oudeyer Pierre-Yves | Professeur(e) | Examinateur(trice) | |
Mr Aussem Alexandre | Professeur(e) | Université Lyon 1 | Examinateur(trice) |
Mme Hassas Salima | Professeur(e) | Université Lyon 1 | Directeur(trice) de thèse |
Mme Matignon Laëtitia | Maître de conférence | Université Lyon 1 | Co-encadrant(e) |