Skill Machines: Temporal Logic Composition in Reinforcement Learning

Abstract

A major challenge in reinforcement learning is specifying tasks in a manner that is both interpretable and verifiable. One common approach is to specify tasks through reward machines—finite state machines that encode the task to be solved. We introduce skill machines, a representation that can be learned directly from these reward machines that encode the solution to such tasks. We propose a framework where an agent first learns a set of base skills in a reward-free setting, and then combines these skills with the learned skill machine to produce composite behaviours specified by any regular language, such as linear temporal logics. This provides the agent with the ability to map from complex logical task specifications to near-optimal behaviours zero-shot. We demonstrate our approach in both a tabular and high-dimensional video game environment, where an agent is faced with several of these complex, long-horizon tasks. Our results indicate that the agent is capable of satisfying extremely complex task specifications, producing near optimal performance with no further learning. Finally, we demonstrate that the performance of skill machines can be improved with regular off-policy reinforcement learning algorithms when optimal behaviours are desired.

Publication
In Deep Reinforcement Learning Workshop @ NeurIPS 2022
Geraud Nangue Tasse
Geraud Nangue Tasse
Associate Lecturer

I am an IBM PhD fellow interested in reinforcement learning (RL) since it is the subfield of machine learning with the most potential for achieving AGI.

Devon Jarvis
Devon Jarvis
Associate Lecturer

I am a PhD candidate and Associate Lecturer at Wits interested in studying systematic generalization and the emergence of modularity in the brain and machines.

Steven James
Steven James
Deputy Lab Director

My research interests include reinforcement learning and planning.

Benjamin Rosman
Benjamin Rosman
Lab Director

I am a Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg. I work in robotics, artificial intelligence, decision theory and machine learning.