Self-supervision for Reinforcement Learning (SSL-RL)

May 8, 2021 // ICLR Workshop

Reinforcement learning (RL) entails letting an agent learn through interaction with an environment. The formalism is powerful in it’s generality, and presents us with a hard open-ended problem: how can we design agents that learn efficiently, and generalize well, given only sensory information and a scalar reward signal? The goal of this workshop is to explore the role of self-supervised learning within reinforcement learning agents, to make progress towards this goal.

Important Dates

Paper Submission Deadline February 26, 2021 March 7, 2021 AoE
Decision Notifications March 26, 2021
Camera Ready Paper Deadline April 25, 2021 AoE
Workshop May 8, 2021

Call for Papers

We invite both short (4 page) and long (8 page) anonymized submissions in the ICLR LaTeX format that develop algorithms, benchmarks, and ideas to allow reinforcement learning agents to learn more effectively by making self-supervised predictions about their environment. More concretely, we welcome submissions around, but not necessarily limited to, the following broad questions:

  • How can we leverage large amounts of unlabelled sensory data from diverse sources to bootstrap learning for reinforcement learning tasks?
  • Can we use auxiliary targets --- generated in an unsupervised manner --- to accelerate learning?
  • Can we meta-learn auxiliary targets that accelerate learning for a wide range of tasks?
  • Can we build benchmarks and protocols for systematically comparing existing self-supervision methods?
  • Do agents that learn multiple auxiliary predictions generalize better to new environments than those that learn purely by maximizing the reward?
  • Can we learn predictive state representations to accelerate learning and generalization?
  • Can we benefit from insights gained from cognitive and neuroscience to build better self-supervisory objectives?

We welcome review and positional papers that may foster discussions. We also encourage published papers from *non-ML* conferences, e.g. epistemology, cognitive science, psychology, neuroscience, that are within the scope of the workshop. Note that as per ICLR guidelines, we don't accept works previously published in other conferences on machine learning, but are open to works that are currently under submission to a conference (such as ICML 2021).

Submissions should be uploaded on OpenReview: SSL-RL submission link

In case of any issues or questions, feel free to email the workshop organizers at:


Pierre-Yves Oudeyer
INRIA/ Flowers team
Irina Higgins
Danijar Hafner
University of Toronto
Elise van der Pol
University of Amsterdam
John Langford
Microsoft Research, New York
Yael Niv


Self-Supervised Learning for a RL agent involves the agent learning (and possibly discovering) many predictions about it’s world. For example, a natural self-supervised prediction task within an agent is to learn the transition dynamics of the environment [1-4]. But given the rich sequential and interactive nature of RL environments, many other prediction tasks could be used as well [5-9]. Self-supervised learning has several possible benefits. First, the agent can directly use its learned primitives to facilitate future learning, by endowing itself with learned priors instead of starting tabula-rasa [10-12]. Second, the agent can indirectly benefit from the learned predictions, by learning a representation that is useful for many different predictions [13]. Such a representation should also facilitate efficient learning [14-15] and exhibit better generalization [16-17]


The aims of this workshop are to explore the potential benefits of self-supervision, how to specify self-supervised tasks, and to bring together people from different areas, including Cognitive Science, Reinforcement Learning, and Computer Vision, with a common interest in building better learning agents. The specific research questions we hope to tackle include:

  • How can we leverage unsupervised data to bootstrap learning in an MDP, and what primitives should we learn: dynamics, representation, skills or something else?
  • How can we measure progress in development of self-supervised, general purpose agents? Do we need to create a GLUE like benchmark for RL?
  • What kind of structure in an MDP can an agent exploit to learn a task faster?
  • How can we design self-supervised objectives that encourage an agent to generalize well out of its training distribution?
  • Can we leverage insights from cognitive science on how humans acquire knowledge to build better self-supervised objectives?

Mind Match Program

The Mind Match event is aimed as a catalyst for discussion between researchers with shared interests, similar to the Neuromatch and BAICS events. Participants will be split into small groups according to topics of interest, and will have a chance to informally chat, discuss together and potentially setup collaborations.

Sing up here, and we will notify you of your groups and the meeting link prior to the event.



Amy Zhang
McGill University / Mila / Facebook
Ankesh Anand
University of Montreal / Mila
Bogdan Mazoure
McGill University / Mila
Devon Hjelm
Microsoft Research / University of Montreal
Khurram Javed
University of Alberta / AMII
Martha White
University of Alberta / AMII
Thang Doan
McGill University / Mila

Accepted papers

  1. [Oral] Learning One Representation to Optimize All Rewards. Ahmed Touati, Yann Ollivier.
  2. [Oral] Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos. Annie S Chen, Suraj Nair, Chelsea Finn.
  3. [Spotlight] Reinforcement Learning with Prototypical Representations. Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto.
  4. [Spotlight] Causal Inference Q-Network: Toward Resilient Reinforcement Learning. Chao-Han Huck Yang, Danny I-Te Hung, Yi Ouyang, Pin-Yu Chen.
  5. [Spotlight] Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment. Philip Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts.
  6. [Poster] COMBO: Conservative Offline Model-Based Policy Optimization. Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn.
  7. [Poster] Demonstration-Guided Reinforcement Learning with Learned Skills. Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J Lim.
  8. [Poster] Generalizable Representations for Reinforcement Learning. Rutav Shah, Vikash Kumar.
  9. [Poster] Goal Reaching via Recursive Reweighting of Offline Data. Scott Emmons, Benjamin Eysenbach, Sergey Levine.
  10. [Poster] Learning State Representations via Temporal Cycle-Consistency Constraint in Model-Based Reinforcement Learning. Changmin Yu, Dong Li, Hangyu Mao, Jianye Hao, Neil Burgess.
  11. [Poster] Learning Task Informed Abstractions. Xiang Fu, Ge Yang, Pulkit Agrawal, Tommi S. Jaakkola.
  12. [Poster] Learning to Explore a Class of Multiple Reward-Free Environments. Mirco Mutti, Mattia Mancassola, Marcello Restelli.
  13. [Poster] Learning to Infer Unseen Contexts in Causal Contextual Reinforcement Learning. Hamid Eghbal-zadeh, Florian Henkel, Gerhard Widmer.
  14. [Poster] Less Suboptimal Learning and Control in Variational POMDPs. Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer.
  15. [Poster] LOCO: Adaptive exploration in reinforcement learning via local estimation of contraction coefficients. Manfred Diaz, Liam Paull, Pablo Samuel Castro.
  16. [Poster] Minimum Description Length Skills for Accelerated Reinforcement Learning. Jesse Zhang, Karl Pertsch, Jiefan Yang, Joseph J Lim.
  17. [Poster] Model-Invariant State Abstractions for Model-Based Reinforcement Learning. Manan Tomar, Amy Zhang, Roberto Calandra, Matthew E. Taylor, Joelle Pineau.
  18. [Poster] Offline Reinforcement Learning with Pseudometric Learning. Robert Dadashi, shideh rezaeifar, Nino Vieillard, Leonard Hussenot, Olivier Pietquin, Matthieu Geist.
  19. [Poster] Optimism is All You Need: Model-Based Imitation Learning From Observation Alone. Rahul Kidambi, Jonathan Daniel Chang, Wen Sun.
  20. [Poster] Out-of-distribution generalization of internal models is correlated with reward. Khushdeep Singh Mann, Steffen Schneider, Alberto Chiappa, Jin Hwa Lee, Matthias Bethge, Alexander Mathis, Mackenzie W Mathis.
  21. [Poster] Pretraining Reward-Free Representations for Data-Efficient Reinforcement Learning. Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, R Devon Hjelm, Philip Bachman, Aaron Courville.
  22. [Poster] PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning. Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar.
  23. [Poster] Relevant Action Matters : Motivating agent with action usefulness. Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin.
  24. [Poster] Representation Matters: Offline Pretraining for Sequential Decision Making. Mengjiao Yang, Ofir Nachum.
  25. [Poster] Resolving Causal Confusion in Reinforcement Learning via Robust Exploration. Clare Lyle, Amy Zhang, Minqi Jiang, Joelle Pineau, Yarin Gal.
  26. [Poster] Self-Supervised Exploration via Latent Bayesian Surprise. Pietro Mazzaglia, Ozan Catal, Tim Verbelen, Bart Dhoedt.
  27. [Poster] Solipsistic Reinforcement Learning. Mingtian Zhang, Peter Noel Hayes, Tim Z. Xiao, Andi Zhang, David Barber.
  28. [Poster] State Entropy Maximization with Random Encoders for Efficient Exploration. Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee.
  29. [Poster] Unsupervised Feature Learning for Manipulation with Contrastive Domain Randomization. Carmel Rabinovitz, Niko Grupen, Aviv Tamar.
  30. [Poster] Variational Model-Based Imitation Learning in High-Dimensional Observation Spaces. Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn.


  1. Finn, Chelsea, Ian Goodfellow, and Sergey Levine. "Unsupervised learning for physical interaction through video prediction." NeurIPS (2016).
  2. Ha, David, and Jürgen Schmidhuber. "Recurrent world models facilitate policy evolution." NeurIPS (2018).
  3. Hafner, Danijar, et al. "Learning latent dynamics for planning from pixels." ICML (2019).
  4. Kipf, Thomas, Elise van der Pol, and Max Welling. "Contrastive learning of structured world models." arXiv. (2019).
  5. Schmidhuber, Jürgen. "A possibility for implementing curiosity and boredom in model-building neural controllers." Proc. of the international conference on simulation of adaptive behavior: From animals to animals. (1991).
  6. Klyubin, Alexander S., Daniel Polani, and Chrystopher L. Nehaniv. "Empowerment: A universal agent-centric measure of control." IEEE Congress on Evolutionary Computation. (2005).
  7. Sutton, Richard S., et al. "Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction." The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. 2011.
  8. Mohamed, Shakir, and Danilo Jimenez Rezende. "Variational information maximisation for intrinsically motivated reinforcement learning." NeurIPS (2015).
  9. Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. (2017).
  10. Ebert, Frederik, et al. "Visual foresight: Model-based deep reinforcement learning for vision-based robotic control." arXiv. (2018).
  11. Sekar, Ramanan, et al. "Planning to Explore via Self-Supervised World Models." arXiv. (2020).
  12. Lynch, Corey, et al. "Learning latent plans from play." CoRL. (2020).
  13. Jaderberg, Max, et al. "Reinforcement learning with unsupervised auxiliary tasks." arXiv.(2016).
  14. Eslami, SM Ali, et al. "Neural scene representation and rendering." Science. (2018).
  15. Anand, Ankesh, et al. "Unsupervised state representation learning in atari." NeurIPS (2019).
  16. Srinivas, Aravind et al. "CURL: Contrastive Unsupervised Representations for Reinforcement Learning" ICML (2020).
  17. Zhang, Amy, et al. "Learning invariant representations for reinforcement learning without reconstruction." arXiv (2020).
  18. Mazoure, Bogdan, et al. "Deep reinforcement and infomax learning." NeurIPS (2020).
  19. Stooke, Adam, et al. "Decoupling representation learning from reinforcement learning." arXiv preprint arXiv:2009.08319 (2020).
  20. Hansen, Nicklas, et al. "Self-Supervised Policy Adaptation during Deployment." arXiv (2020).
  21. Agarwal, Rishab, et al. "Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning" ICLR (2021).
  22. Schwarzer, Max, et al. "Data-Efficient Reinforcement Learning with Self-Predictive Representations." ICLR (2021)
Website theme inspired from the VIGIL workshop. Cover art by Matt Dixon