Is it time to rethink exploration in reinforcement learning to be about more than just finding the best policy for the immediate task at hand?

@balloch and I say yes: https://arxiv.org/abs/2210.06168

@rockt and Greffenstette say yes: https://arxiv.org/abs/2211.07819

The Role of Exploration for Task Transfer in Reinforcement Learning

The exploration--exploitation trade-off in reinforcement learning (RL) is a well-known and much-studied problem that balances greedy action selection with novel experience, and the study of exploration methods is usually only considered in the context of learning the optimal policy for a single learning task. However, in the context of online task transfer, where there is a change to the task during online operation, we hypothesize that exploration strategies that anticipate the need to adapt to future tasks can have a pronounced impact on the efficiency of transfer. As such, we re-examine the exploration--exploitation trade-off in the context of transfer learning. In this work, we review reinforcement learning exploration methods, define a taxonomy with which to organize them, analyze these methods' differences in the context of task transfer, and suggest avenues for future investigation.

arXiv.org
There aren’t many research groups that I feel operate on the same wavelength as my team. I’m not usually worried about scoops. But @rockt and Greffenstette… sometimes I wonder if they have a spy in my lab.
@Riedl @rockt Their paper is 29 pages long... so got the upper hand 😂 . Anyway, there is so much more than better exploration that is needed 😉 😆