Self-supervision for Reinforcement Learning (SSL-RL)

May 8, 2021 // ICLR Workshop

Reinforcement learning (RL) entails letting an agent learn through interaction with an environment. The formalism is powerful in it’s generality, and presents us with a hard open-ended problem: how can we design agents that learn efficiently, and generalize well, given only sensory information and a scalar reward signal? The goal of this workshop is to explore the role of self-supervised learning within reinforcement learning agents, to make progress towards this goal.

Important Dates

Paper Submission Deadline February 26, 2021 March 2, 2021 AoE
Decision Notifications March 26, 2021
Camera Ready Paper Deadline April 15, 2021 AoE
Workshop May 8, 2021

Call for Papers

We invite both short (4 page) and long (8 page) anonymized submissions in the ICLR LaTeX format that develop algorithms, benchmarks, and ideas to allow reinforcement learning agents to learn more effectively by making self-supervised predictions about their environment. More concretely, we welcome submissions around, but not necessarily limited to, the following broad questions:

  • How can we leverage large amounts of unlabelled sensory data from diverse sources to bootstrap learning for reinforcement learning tasks?
  • Can we use auxiliary targets --- generated in an unsupervised manner --- to accelerate learning?
  • Can we meta-learn auxiliary targets that accelerate learning for a wide range of tasks?
  • Can we build benchmarks and protocols for systematically comparing existing self-supervision methods?
  • Do agents that learn multiple auxiliary predictions generalize better to new environments than those that learn purely by maximizing the reward?
  • Can we learn predictive state representations to accelerate learning and generalization?
  • Can we benefit from insights gained from cognitive and neuroscience to build better self-supervisory objectives?

We welcome review and positional papers that may foster discussions. We also encourage published papers from *non-ML* conferences, e.g. epistemology, cognitive science, psychology, neuroscience, that are within the scope of the workshop. Note that as per ICLR guidelines, we don't accept works previously published in other conferences on machine learning, but are open to works that are currently under submission to a conference (such as ICML 2021).

Submissions should be uploaded on OpenReview: SSL-RL submission link

In case of any issues or questions, feel free to email the workshop organizers at:


Pierre-Yves Oudeyer
INRIA/ Flowers team
Irina Higgins
Danijar Hafner
University of Toronto
Elise van der Pol
University of Amsterdam
John Langford
Microsoft Research, New York
Yael Niv


Self-Supervised Learning for a RL agent involves the agent learning (and possibly discovering) many predictions about it’s world. For example, a natural self-supervised prediction task within an agent is to learn the transition dynamics of the environment [1-4]. But given the rich sequential and interactive nature of RL environments, many other prediction tasks could be used as well [5-9]. Self-supervised learning has several possible benefits. First, the agent can directly use its learned primitives to facilitate future learning, by endowing itself with learned priors instead of starting tabula-rasa [10-12]. Second, the agent can indirectly benefit from the learned predictions, by learning a representation that is useful for many different predictions [13]. Such a representation should also facilitate efficient learning [14-15] and exhibit better generalization [16-17]


The aims of this workshop are to explore the potential benefits of self-supervision, how to specify self-supervised tasks, and to bring together people from different areas, including Cognitive Science, Reinforcement Learning, and Computer Vision, with a common interest in building better learning agents. The specific research questions we hope to tackle include:

  • How can we leverage unsupervised data to bootstrap learning in an MDP, and what primitives should we learn: dynamics, representation, skills or something else?
  • How can we measure progress in development of self-supervised, general purpose agents? Do we need to create a GLUE like benchmark for RL?
  • What kind of structure in an MDP can an agent exploit to learn a task faster?
  • How can we design self-supervised objectives that encourage an agent to generalize well out of its training distribution?
  • Can we leverage insights from cognitive science on how humans acquire knowledge to build better self-supervised objectives?

Mind Match Program

The Mind Match event is aimed as a catalyst for discussion between researchers with shared interests, similar to the Neuromatch and BAICS events. Participants will be split into small groups according to topics of interest, and will have a chance to informally chat, discuss together and potentially setup collaborations.

Sing up here, and we will notify you of your groups and the meeting link prior to the event.



Amy Zhang
McGill University / Mila / Facebook
Ankesh Anand
University of Montreal / Mila
Bogdan Mazoure
McGill University / Mila
Devon Hjelm
Microsoft Research / University of Montreal
Khurram Javed
University of Alberta / AMII
Martha White
University of Alberta / AMII
Thang Doan
McGill University / Mila


  1. Finn, Chelsea, Ian Goodfellow, and Sergey Levine. "Unsupervised learning for physical interaction through video prediction." NeurIPS (2016).
  2. Ha, David, and Jürgen Schmidhuber. "Recurrent world models facilitate policy evolution." NeurIPS (2018).
  3. Hafner, Danijar, et al. "Learning latent dynamics for planning from pixels." ICML (2019).
  4. Kipf, Thomas, Elise van der Pol, and Max Welling. "Contrastive learning of structured world models." arXiv. (2019).
  5. Schmidhuber, Jürgen. "A possibility for implementing curiosity and boredom in model-building neural controllers." Proc. of the international conference on simulation of adaptive behavior: From animals to animals. (1991).
  6. Klyubin, Alexander S., Daniel Polani, and Chrystopher L. Nehaniv. "Empowerment: A universal agent-centric measure of control." IEEE Congress on Evolutionary Computation. (2005).
  7. Sutton, Richard S., et al. "Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction." The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. 2011.
  8. Mohamed, Shakir, and Danilo Jimenez Rezende. "Variational information maximisation for intrinsically motivated reinforcement learning." NeurIPS (2015).
  9. Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. (2017).
  10. Ebert, Frederik, et al. "Visual foresight: Model-based deep reinforcement learning for vision-based robotic control." arXiv. (2018).
  11. Sekar, Ramanan, et al. "Planning to Explore via Self-Supervised World Models." arXiv. (2020).
  12. Lynch, Corey, et al. "Learning latent plans from play." CoRL. (2020).
  13. Jaderberg, Max, et al. "Reinforcement learning with unsupervised auxiliary tasks." arXiv.(2016).
  14. Eslami, SM Ali, et al. "Neural scene representation and rendering." Science. (2018).
  15. Anand, Ankesh, et al. "Unsupervised state representation learning in atari." NeurIPS (2019).
  16. Srinivas, Aravind et al. "CURL: Contrastive Unsupervised Representations for Reinforcement Learning" ICML (2020).
  17. Zhang, Amy, et al. "Learning invariant representations for reinforcement learning without reconstruction." arXiv (2020).
  18. Mazoure, Bogdan, et al. "Deep reinforcement and infomax learning." NeurIPS (2020).
  19. Stooke, Adam, et al. "Decoupling representation learning from reinforcement learning." arXiv preprint arXiv:2009.08319 (2020).
  20. Hansen, Nicklas, et al. "Self-Supervised Policy Adaptation during Deployment." arXiv (2020).
  21. Agarwal, Rishab, et al. "Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning" ICLR (2021).
  22. Schwarzer, Max, et al. "Data-Efficient Reinforcement Learning with Self-Predictive Representations." ICLR (2021)
Website theme inspired from the VIGIL workshop. Cover art by Matt Dixon