If there is one thing the deep learning revolution has taught us, it's that neural nets will outperform hand-designed heuristics, given enough compute and data.

But we still use hand-designed heuristics to train our models. Let's replace our optimizers with trained neural nets!

If you are training models with < 5e8 parameters, for < 2e5 training steps, then with high probability this LEARNED OPTIMIZER will beat or match the tuned optimizer you are currently using, out of the box, with no hyperparameter tuning (!).

https://velo-code.github.io
https://arxiv.org/abs/2211.09760

Redirecting to https://github.com/google/learned_optimization/tree/main/learned_optimization/research/general_lopt

Meta-training learned optimizers is HARD. Each meta-training datapoint is an entire optimization task, so building a large meta-training dataset is HARD. Each of N meta-training steps can contain N training steps applying the learned optimizer -- so compute is also extreme (N^2).

And the resulting learned optimizer works really well! We reached out to other researchers inside Brain, and had them try it on their tasks, and subject to the scale constraints I mention above it did as well or better than what they were currently using, with no tuning.

Tasks include multiple vision models, multiple language models, decision transformers, distillation tasks, scientific modeling, and more.

Huge thanks to Luke Metz for leading this research direction for the last half dozen years (!!) -- it's wonderful to see it bear fruit.

Also, huge thanks to James Harrison who has completely taken over the project over the last several months, and is responsible for the careful analysis and coherent story in the paper, as well as the exciting ongoing work.

(Posting this *exclusive content* to Mastodon before Twitter :P You're winning by being here!)
@jascha Awesome work, thanks for sharing!