So sánh Muon và AdamW trong đào tạo mô hình AI. Muon có thể underfit trong khi AdamW overfit. Cả hai mô hình đều đạt độ chính xác cao nhưng AdamW nhỉnh hơn. #Muon #AdamW #AI #MachineLearning #ĐàoTạoMôHình #TríTuệNhânTạo #Optimization #DeepLearning

https://www.reddit.com/r/LocalLLaMA/comments/1owa4ag/muon_underfits_adamw_overfits/

YouTube Creators Show Hollywood a Faster, Cheaper Studio Model

At YouTube’s NFL suite, Dhar Mann, AdamW, and CEO Neal Mohan outlined a creator-driven studio system that's outpacing Hollywood.

IndieWire

Practical Efficiency of Muon for Pretraining

O Muon alcança o mesmo loss com 10–15% menos tokens e converge mais depressa, preservando a eficiência de dados mesmo com tamanhos de lote muito grandes. Recomenda-se como sucessor “drop-in” do AdamW em grande escala.

📎https://arxiv.org/pdf/2505.02222

#DeepLearning #Optimization #AdamW

When u never clean ur car #shorts

YouTube

Understanding AdamW through Proximal Methods and Scale-Freeness

Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

https://openreview.net/forum?id=IKhEPWGdwK

#adamw #adam #gradients

Understanding AdamW through Proximal Methods and Scale-Freeness

Adam has been widely adopted for training deep neural networks due to less hyperparameter tuning and remarkable performance. To improve generalization, Adam is typically used in tandem with a...

OpenReview