Asking for #rl opinions.
Is a value function a model in the RL sense? Why? Why not?
Feels like the difference between model-based and value-based methods is getting more and more arbitrary.
Asking for #rl opinions.
Is a value function a model in the RL sense? Why? Why not?
Feels like the difference between model-based and value-based methods is getting more and more arbitrary.
@proceduralia I generally agree with the response @jhamrick gave below. Value-functions or transition & reward dynamics estimators are predictors of different things. The former predicts a summary of expected discounted future values, while the latter predicts the next-state distribution and rewards.
The big difference for me is you can use the latter to estimate the former, but not the other way around.
@proceduralia @jhamrick i looked at these types of equivalence relation implications in my phd thesis:
https://central.bac-lac.gc.ca/.item?id=NR78603&op=pdf&app=Library&oclc_number=1019479357
the tl;dr was this figure from section 3.6 (which unfortunately probably requires you to read more of my thesis to understand 🙃 )