@psc Deep networks are just one parameterization, right? If you choose another parameterization, you could (potentially) update using gradient rules established in reinforcement learning (eg. updating mu, sigma through policy gradient rule assuming a gaussian policy w/ parameters mu, sigma). I don't know if it's popular in real world though. I think that some early IRL work also used non deep network parameterizations for reward functions.