Distributed Representation in RL: from AI to Dopamine


Paul Masset | Harvard University, Cambridge, USA
Pablo Tano | University of Geneva, Switzerland


The temporal difference (TD) theory of dopaminergic activity in the basal ganglia has been one of the most influential ideas in neuroscience during the last two decades. It remains one of the few examples where a normative computation has been assigned to a genetically defined cell type. However, recent experimental work expanding the anatomical locations of recordings and the task designs has shown heterogeneity in dopamine responses that is not readily explained within the canonical TD framework. On the other hand, recent progress in AI has shown that extending the TD learning rule allows agents to learn more powerful representations that lead to improved performance in various domains. For example, improved performance has been obtained by extending the TD rule to learn the entire value distribution of states (instead of just their expectation), the expected values of states with multiple temporal discounts (instead of a single one) and the dynamics of transitions in the environment.

The progress in AI remains to be fed into the computational neuroscience models and experimental designs that are currently competing to explain the heterogeneity in dopamine responses. Some experiments are starting to link our understanding of dopamine activity in the brain to the distributed representation learning ideas that have succeeded in AI, but it is still uncertain which role, if any, the distributed TD backups play in the brain, how they are represented in neural activity and how they affects state representations and learning. In this workshop, we propose to cover (1) the experimental work that has revealed the heterogeneity in dopamine responses, (2) the main AI algorithms that use TD extensions to learn better representations and (3) the computational neuroscience models that could explain the observed heterogeneity as distributed TD learning across several dimensions.

Schedule (CEST)




Peter Dayan | Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Factual and Counterfactual Dopamine


Mitsuko Watabe-Uchida | Harvard University, Cambridge, USA
Multiple axes of dopamine evaluation signals in the mouse striatum


15 min break


Marc Bellemare | Google Research & McGill University, Montreal, Canada
Distributional reinforcement learning: A language for characterizing randomness in outcomes


HyungGoo Kim | Sungkyunkwan University, South Korea
Diverse dopamine signals during spatial navigation


Jan Drugowitsch | Harvard Medical School, Cambridge, USA
Distinguishing neural codes for uncertain value coding


30 min break


Martha White | University of Alberta, Canada
Neurons as GVFs: Leveraging the Heterogeneity of General Value Functions


Rachel Lee | Princeton University, USA
Explaining dopaminergic response heterogeneity as a reflection of cortical state representation


Jakob Foerster| University of Oxford, UK
Off-Belief Learning


Panel Discussion