If there is one thing the deep learning revolution has taught us, it's that neural nets will outperform hand-designed heuristics, given enough compute and data.

But we still use hand-designed heuristics to train our models. Let's replace our optimizers with trained neural nets!

If you are training models with < 5e8 parameters, for < 2e5 training steps, then with high probability this LEARNED OPTIMIZER will beat or match the tuned optimizer you are currently using, out of the box, with no hyperparameter tuning (!).

https://velo-code.github.io
https://arxiv.org/abs/2211.09760

Redirecting to https://github.com/google/learned_optimization/tree/main/learned_optimization/research/general_lopt

@jascha Sounds very cool. How big is the overhead of running this vs 'heuristic' optimizers? Ie is this only a gain when training large models?
@EmilevanKrieken Overhead is relatively small in an absolute sense. It's about 10x the overhead of Adam, which is small compared to the cost of computing the gradient, for reasonable scale problems, trained with a reasonable minibatch size. See leftmost pane in this plot:
@jascha
That's very impressive, thank you!