Asking for #rl opinions.

Is a value function a model in the RL sense? Why? Why not?

Feels like the difference between model-based and value-based methods is getting more and more arbitrary.

@proceduralia I generally agree with the response @jhamrick gave below. Value-functions or transition & reward dynamics estimators are predictors of different things. The former predicts a summary of expected discounted future values, while the latter predicts the next-state distribution and rewards.

The big difference for me is you can use the latter to estimate the former, but not the other way around.

@proceduralia @jhamrick Bisimulation metrics give you something in between: they're finer grained than simply value differences, but are coarser grained than dynamics-equivalences.

@proceduralia @jhamrick i looked at these types of equivalence relation implications in my phd thesis:
https://central.bac-lac.gc.ca/.item?id=NR78603&op=pdf&app=Library&oclc_number=1019479357

the tl;dr was this figure from section 3.6 (which unfortunately probably requires you to read more of my thesis to understand 🙃 )

@psc @jhamrick Thanks for the comments, Pablo! There is indeed a close relationship between value-aware models and bisimulation, and this is an interesting perspective on it!