#IlyaSutskever discusses the challenges of #AI #modelgeneralisation, comparing it to #humanlearning. He suggests that the current focus on #RLtraining, driven by evaluation metrics, might be limiting model adaptability. Sutskever proposes that expanding training environments or improving generalisation from pre-training data could enhance model performance across diverse tasks. https://www.dwarkesh.com/p/ilya-sutskever-2?eicker.news #tech #media #news
Ilya Sutskever – We're moving from the age of scaling to the age of research

“These models somehow just generalize dramatically worse than people. It's a very fundamental thing.”

Dwarkesh Podcast

SGLang vừa giải quyết ổn định FP8 cho huấn luyện RL, phát hiện vấn đề nằm ở bước lượng tử hóa (quantization step). Đây là bước tiến lớn cho RLHF và tinh chỉnh RL cục bộ, giúp đơn giản hóa việc sử dụng độ chính xác hỗn hợp.
#SGLang #FP8 #RLTraining #Quantization #AI #MachineLearning #HuấnLuyệnRL #TríTuệNhânTạo #HọcMáy

https://www.reddit.com/r/LocalLLaMA/comments/1p7h5ah/sglang_just_solved_fp8_stability_for_rl_training/