Mastodawn

fly51fly (@fly51fly)

Tsinghua 및 Stanford 공동연구진의 논문 'Configuration-to-Performance Scaling Law with Neural Ansatz' 공개. 본 연구는 'Neural Ansatz'라는 해석적 가정을 도입해 모델 구성(configuration)과 성능 간의 스케일링 법칙을 제시·정량화하며, 대형 모델 설계와 성능 예측에 영향을 줄 수 있는 이론적 결과를 담고 있다(저자 H. Zhang, K. Wen, T. Ma, arXiv 2026).

https://x.com/fly51fly/status/2022064878392160722

#scalinglaw #neuralansatz #modelscaling #research

fly51fly (@fly51fly) on X

[LG] Configuration-to-Performance Scaling Law with Neural Ansatz H Zhang, K Wen, T Ma [Tsinghua University & Stanford University] (2026) https://t.co/jwaeLl05vj

X (formerly Twitter)

N-gated Hacker News Oct 4

🎩 Oh, joy! More pontification on how to cram "knowledge" into our beloved gigantic text parrots 🤖, because clearly, they're not bloated enough with random facts already. Thanks to the mysterious "Knowledge Infusion Scaling Law," we can look forward to even more impressive gibberish—sure to impress both humans and toaster ovens alike! 🍞📚
https://arxiv.org/abs/2509.19371 #textparrots #knowledgeinfusion #gibberish #scalinglaw #AIhumor #techsatire #HackerNews #ngated

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Large language models (LLMs) have attracted significant attention due to their impressive general capabilities across diverse downstream tasks. However, without domain-specific optimization, they often underperform on specialized knowledge benchmarks and even produce hallucination. Recent studies show that strategically infusing domain knowledge during pretraining can substantially improve downstream performance. A critical challenge lies in balancing this infusion trade-off: injecting too little domain-specific data yields insufficient specialization, whereas excessive infusion triggers catastrophic forgetting of previously acquired knowledge. In this work, we focus on the phenomenon of memory collapse induced by over-infusion. Through systematic experiments, we make two key observations, i.e. 1) Critical collapse point: each model exhibits a threshold beyond which its knowledge retention capabilities sharply degrade. 2) Scale correlation: these collapse points scale consistently with the model's size. Building on these insights, we propose a knowledge infusion scaling law that predicts the optimal amount of domain knowledge to inject into large LLMs by analyzing their smaller counterparts. Extensive experiments across different model sizes and pertaining token budgets validate both the effectiveness and generalizability of our scaling law.

arXiv.org

Hacker News Oct 4

Knowledge Infusion Scaling Law for Pre-Training Large Language Models

https://arxiv.org/abs/2509.19371

#HackerNews #KnowledgeInfusion #LanguageModels #PreTraining #AIResearch #MachineLearning #ScalingLaw

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Large language models (LLMs) have attracted significant attention due to their impressive general capabilities across diverse downstream tasks. However, without domain-specific optimization, they often underperform on specialized knowledge benchmarks and even produce hallucination. Recent studies show that strategically infusing domain knowledge during pretraining can substantially improve downstream performance. A critical challenge lies in balancing this infusion trade-off: injecting too little domain-specific data yields insufficient specialization, whereas excessive infusion triggers catastrophic forgetting of previously acquired knowledge. In this work, we focus on the phenomenon of memory collapse induced by over-infusion. Through systematic experiments, we make two key observations, i.e. 1) Critical collapse point: each model exhibits a threshold beyond which its knowledge retention capabilities sharply degrade. 2) Scale correlation: these collapse points scale consistently with the model's size. Building on these insights, we propose a knowledge infusion scaling law that predicts the optimal amount of domain knowledge to inject into large LLMs by analyzing their smaller counterparts. Extensive experiments across different model sizes and pertaining token budgets validate both the effectiveness and generalizability of our scaling law.

arXiv.org

AI Sparkup Oct 2

LLM 시대의 끝? 거대 AI 기업들이 월드 모델에 투자하는 이유

GPT와 Claude로 대표되는 거대 언어 모델이 텍스트 이해의 한계에 부딪히면서 AI 업계가 물리 세계를 이해하고 시뮬레이션하는 World Models로 대전환을 시작했다. Meta의 AssetGen, Google의 Gemini Robotics, Nvidia의 Omniverse 등 주요 기업들의 구체적 사례와 함께 이 기술이 로보틱스, 자율주행, VR 등 산업 전반에 미칠 영향을 살펴본다.

https://aisparkup.com/posts/5222