🤖 Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

"While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct framework to model the scaling of benchmark performance from the training budget. We find that for a fixe…"

https://machinelearning.apple.com/research/downstream-metrics

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task…

Apple Machine Learning Research