Mastodawn

catastrophic forgetter Nov 14, 2022

i would like to get off the ride now

I was originally aggravated why neither "Grokking Deep RL" nor the GAE paper justified the interchangeability of total reward versus go-to-reward in the policy gradient. Surely the leap can't be that involved! Then I learned.

Show thread

Ashish Gaurav Nov 14, 2022

@surprisal I am still a little confused by the proof, although I know from GAE paper that both forms are acceptable. Could you share the link to the complete document?

Show thread

catastrophic forgetter Nov 14, 2022

@ashishgaurav_13 OpenAI's "Spinning Up" website has a page with the proof. Haven't fully gone through it yet though:

https://spinningup.openai.com/en/latest/spinningup/extra_pg_proof1.html?highlight=extra%20material

Extra Material — Spinning Up documentation