Asking for #rl opinions.

Is a value function a model in the RL sense? Why? Why not?

Feels like the difference between model-based and value-based methods is getting more and more arbitrary.

@proceduralia I would say it's not a transition function, which is usually what's meant by "model" in RL. It is, however, a model of a particular property of the environment. Value-equivalent models are a blend between the two: they capture transition dynamics of value, but not observations.

@jhamrick Yes! I was implicitly referring to value-equivalent/value-aware models.

Since they are are not constrained to be similar to the actual transition model, I sometimes wonder if it is more natural to think of them simply as inducing particular inductive biases (maybe more precisely, learning architectures) for value-based RL, and not really as part of model-based methods.

@proceduralia yeah that's interesting, I've wondered the same myself! I tend to group then in with MB methods because many planning algorithms often still work with value equivalent models, eg MCTS, CEM, Dyna (sort of). But they of course then lack some of the properties that make models interesting, like being task agnostic and (in principle) good for transfer. But then again, models trained on-policy aren't really task agnostic and aren't great for transfer either. So I am undecided!