Tim Rocktäschel

1.4K Followers
124 Following
218 Posts
Open-Endedness Team Lead at DeepMind, Associate Professor at UCL Centre for AI leading UCL @dark Lab, and ELLIS Scholar. ex University of Oxford, Facebook AI Research.
Scholarhttps://scholar.google.co.uk/citations?hl=en&user=mWBY8aIAAAAJ&view_op=list_works&sortby=pubdate
Twitterhttps://twitter.com/_rockt
Webpagehttps://rockt.github.io/
Human-Timescale Adaptation in an Open-Ended Task Space

Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.

arXiv.org
Human-Timescale Adaptation in an Open-Ended Task Space

Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.

arXiv.org
Imputing data is difficult, not just for AI. This is a terrestrial globe from around 1640 by Jacob Aertsz Colom (you can find it in @[email protected]). California is drawn as an island.

RT @[email protected]

Access to diverse partners is crucial when training robust cooperators or evaluating ad-hoc coordination. In our top 25% #iclr2023 paper, we tackle the challenge of generating diverse cooperative policies and expose the issue of "sabotages" affecting simpler methods.

A 🧵!

🐦🔗: https://twitter.com/_andreilupu/status/1618667577005441024

Andrei Lupu on Twitter

“Access to diverse partners is crucial when training robust cooperators or evaluating ad-hoc coordination. In our top 25% #iclr2023 paper, we tackle the challenge of generating diverse cooperative policies and expose the issue of "sabotages" affecting simpler methods. A 🧵!”

Twitter

RT @[email protected]

"34% of mothers and 12% of fathers globally leave full-time STEMM employment after becoming parents" (Sugrue et al., unpublished) https://twitter.com/mothersinsci/status/1618322944056254480

🐦🔗: https://twitter.com/verena_rieser/status/1618515798225735681

Mothers in Science on Twitter

“Big news! We united 18 organizations & thousands of ♀️ scientists globally to eliminate gender inequities in #ResearchFunding & promote inclusion of caregivers in #STEMM. Follow and share to help us create systemic change! #action4STEMMoms. #WomenInSTEM https://t.co/IOZ597e9Ty”

Twitter

RT @[email protected]

Trajectory autoencoding planner (TAP) has been accepted by ICLR2023, and the arxiv paper is also updated🥳

TAP models the offline RL trajectories with a VQ-VAE and planning is reinterpreted as looking for optimal latent codes.

Highlighted updates and thoughts in 🧵1/N https://twitter.com/zhengyaojiang/status/1562078710265794560

🐦🔗: https://twitter.com/zhengyaojiang/status/1618272741131706368

Zhengyao Jiang on Twitter

“I'm excited to announce Trajectory Autoencoding Planner (TAP), a novel planning-based sequence modelling method that can scale to high dimensionality state-action space. (1/N) 🕸️Website: https://t.co/FPepEvjdpe 📜Paper: https://t.co/waUC69XpCB 💻Code: https://t.co/17XeSy4XQ0”

Twitter

Great article about our work on AdA and @[email protected]'s DreamerV3. Scale is coming for RL!

https://www.lesswrong.com/posts/4xGAmZ9GTGAkszHoH/parameter-scaling-comes-for-rl-maybe

RT @[email protected]

I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans.

https://sites.google.com/corp/view/adaptive-agent/

See the thread for more details 👇 [1/N]

🐦🔗: https://twitter.com/FeryalMP/status/1616035293064462338

Parameter Scaling Comes for RL, Maybe - LessWrong

TLDR Unlike language models or image classifiers, past reinforcement learning models did not reliably get better as they got bigger. Two DeepMind RL papers published in January 2023 nevertheless show…

RT @[email protected]

Now that we can write Tiny Papers @[email protected], what should we write about?

I'd like to invite all established researchers to contribute Tiny Ideas as inspirations, seeds for discussions & future collaborations! #TinyIdeasForTinyPapers

I'll start. Note: bad ideas == good starts.

🐦🔗: https://twitter.com/savvyRL/status/1617974811875217411

Rosanne Liu on Twitter

“Now that we can write Tiny Papers @iclr_conf, what should we write about? I'd like to invite all established researchers to contribute Tiny Ideas as inspirations, seeds for discussions & future collaborations! #TinyIdeasForTinyPapers I'll start. Note: bad ideas == good starts.”

Twitter

RT @[email protected]

Since it's impossible to search for past tweets and it's that time of the year again - here is our brief "How to ML Paper" guide again: https://docs.google.com/document/d/16R1E2ExKUCP5SlXWHr-KzbVDx9DBUclra-EbU8IB-iE/edit#heading=h.16t67gkeu9dx

Good luck with #ICML2023 and remember there is always another deadline!

🐦🔗: https://twitter.com/j_foerst/status/1617670799552569345

How to ML Paper - A brief Guide

How to ML Paper - A brief Guide Feel free to comment / share and happy paper writing! Also, please see caveats* below. If you like this, why not follow How to ML on Twitter and share the advice/love? Canonical ML Paper Structure Abstract (TL;DR of paper): X: What are we trying to do and why is i...

Google Docs
ICLR 2023

Conference Platform