Accounting for the Sequential Nature of States to Learn Representations in Reinforcement Learning

Abstract

In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric learning requires supervision in the form of a distance function, which is absent in reinforcement learning. To overcome this, we leverage the sequential nature of states in a replay buffer to approximate a distance metric and provide a weak supervision signal, under the assumption that temporally close states are also semantically similar. We modify a VAE with triplet loss and demonstrate that this approach is able to learn useful features for downstream tasks, without additional supervision, in environments where standard VAEs fail.

Publication
The 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making
Devon Jarvis
Devon Jarvis
Associate Lecturer

I am a PhD candidate and Associate Lecturer at Wits interested in studying systematic generalization and the emergence of modularity in the brain and machines.

Richard Klein
Richard Klein
PRIME Lab Director

I am an Associate Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg, and a co-PI of the PRIME lab.

Steven James
Steven James
Deputy Lab Director

My research interests include reinforcement learning and planning.