what's the weirdest thing you've stumbled upon in #RL #reinforcementlearning ?

i'll start:

if using neural nets* you don't actually need any rewards to train an agent optimally on episodic cartpole.

* or positive initializations

@psc I have experienced this for CartPole as well as for one of my other environments (a custom driving environment) on some occasions 😅 Turns out sometimes a randomly initialized policy is a decent policy.
@ashishgaurav_13 in cartpole initial random policies don't do well, but they still *always* converge to optional policy even with zero rewards!
@psc What I meant here was that I found a weird seed by chance that gave me a decent initial policy. This was probably once in a million event.