Mastodawn

Jascha Sohl-Dickstein Nov 18, 2022

If there is one thing the deep learning revolution has taught us, it's that neural nets will outperform hand-designed heuristics, given enough compute and data.

But we still use hand-designed heuristics to train our models. Let's replace our optimizers with trained neural nets!

Show thread

Jascha Sohl-Dickstein Nov 18, 2022

If you are training models with < 5e8 parameters, for < 2e5 training steps, then with high probability this LEARNED OPTIMIZER will beat or match the tuned optimizer you are currently using, out of the box, with no hyperparameter tuning (!).

https://velo-code.github.io
https://arxiv.org/abs/2211.09760

Redirecting to https://github.com/google/learned_optimization/tree/main/learned_optimization/research/general_lopt

Show thread

Emile van Krieken Nov 18, 2022

@jascha Sounds very cool. How big is the overhead of running this vs 'heuristic' optimizers? Ie is this only a gain when training large models?

Show thread

Jascha Sohl-Dickstein Nov 18, 2022

@EmilevanKrieken Overhead is relatively small in an absolute sense. It's about 10x the overhead of Adam, which is small compared to the cost of computing the gradient, for reasonable scale problems, trained with a reasonable minibatch size. See leftmost pane in this plot:

Show thread

Emile van Krieken

@jascha
That's very impressive, thank you!