Is it time to rethink exploration in reinforcement learning to be about more than just finding the best policy for the immediate task at hand?

@balloch and I say yes: https://arxiv.org/abs/2210.06168

@rockt and Greffenstette say yes: https://arxiv.org/abs/2211.07819

The Role of Exploration for Task Transfer in Reinforcement Learning

The exploration--exploitation trade-off in reinforcement learning (RL) is a well-known and much-studied problem that balances greedy action selection with novel experience, and the study of exploration methods is usually only considered in the context of learning the optimal policy for a single learning task. However, in the context of online task transfer, where there is a change to the task during online operation, we hypothesize that exploration strategies that anticipate the need to adapt to future tasks can have a pronounced impact on the efficiency of transfer. As such, we re-examine the exploration--exploitation trade-off in the context of transfer learning. In this work, we review reinforcement learning exploration methods, define a taxonomy with which to organize them, analyze these methods' differences in the context of task transfer, and suggest avenues for future investigation.

arXiv.org
There aren’t many research groups that I feel operate on the same wavelength as my team. I’m not usually worried about scoops. But @rockt and Greffenstette… sometimes I wonder if they have a spy in my lab.
@Riedl @rockt Their paper is 29 pages long... so got the upper hand 😂 . Anyway, there is so much more than better exploration that is needed 😉 😆
@Riedl that's a nice compliment to get! We should also add a citation to that paper. Thanks for sharing!
@rockt I see that you didn’t address the question of whether you have a spy in my lab. Anyway, I always enjoy reading your papers. Keep up the great work
@Riedl okay you got us. @rajammanabrolu was an inside agent, but now we are steering the lab in pitch black @dark ;)

@Riedl @balloch @rockt the bandit literature has been studying exploration for a while. A lot of theoretical work, many interesting algorithms and results. They don't phrase it in terms of A(G)I or even x-task adaptation, afaik, but it's the same focus. AFAIK, it very much remains an open problem.

@hal and John Langford can probably say much more.

It's interesting to think how style and goals impact framing and word choices across these somewhat different, but also similar research cliques

@yoavartzi @balloch @rockt @Halo
Our hypotheses are tightly scoped, as I don’t spend much time thinking about AGI.

Will look into bandits, thanks for the pointer!