Mastodawn

Pierluca D'Oro Nov 8, 2022

Asking for #rl opinions.

Is a value function a model in the RL sense? Why? Why not?

Feels like the difference between model-based and value-based methods is getting more and more arbitrary.

Pablo Samuel Castro Nov 9, 2022

@proceduralia I generally agree with the response @jhamrick gave below. Value-functions or transition & reward dynamics estimators are predictors of different things. The former predicts a summary of expected discounted future values, while the latter predicts the next-state distribution and rewards.

The big difference for me is you can use the latter to estimate the former, but not the other way around.

Show thread

Pablo Samuel Castro Nov 9, 2022

@proceduralia @jhamrick Bisimulation metrics give you something in between: they're finer grained than simply value differences, but are coarser grained than dynamics-equivalences.

Show thread

Pablo Samuel Castro

@proceduralia @jhamrick i looked at these types of equivalence relation implications in my phd thesis:
https://central.bac-lac.gc.ca/.item?id=NR78603&op=pdf&app=Library&oclc_number=1019479357

the tl;dr was this figure from section 3.6 (which unfortunately probably requires you to read more of my thesis to understand 🙃 )

Show thread

Pierluca D'Oro Nov 11, 2022

@psc @jhamrick Thanks for the comments, Pablo! There is indeed a close relationship between value-aware models and bisimulation, and this is an interesting perspective on it!