Rohan Paul (@rohanpaul_ai)

Yann LeCun이 LLM의 현실 세계 지능 한계를 설명하며, 최대 규모의 LLM이 약 30조 단어(대략 10^14 바이트)의 텍스트로 학습되었다고 지적. 이 수치가 언뜻 커 보이지만 실제 세계에서 사물을 배우는 인간(예: 4세 아동)의 경험량과 비교하면 한계가 드러난다고 설명함.

https://x.com/rohanpaul_ai/status/2029377041653694821

#yannlecun #llm #airesearch #modelscaling #limitations

Rohan Paul (@rohanpaul_ai) on X

Yann LeCun (@ylecun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours

X (formerly Twitter)

AISatoshi (@AiXsatoshi)

Qwen3.5-2B-Q4는 대부분 성공하지 못해 2B와 4B 모델 사이의 성능 경계가 뚜렷하다고 평가. 작성자는 이 지점이 현 시점에서의 한계일 수 있다고 관찰함.

https://x.com/AiXsatoshi/status/2028501316855873707

#qwen #llm #modelscaling #quantization

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

Qwen3.5-2B-Q4では、ほとんど成功しない。2Bと4Bの能力境界は明瞭。ここら辺が現状の限界か

X (formerly Twitter)

Cerebras (@cerebras)

2018년 BERT-Large(3.4억 파라미터)와 비교해 오늘날 Kimi-k2 같은 최첨단 모델은 1조 파라미터를 넘겨 약 3,000배 성장했습니다. Cerebras의 연구 책임자 @dmsobol이 특히 MoE(혼합 전문가) 아키텍처가 규모 확대 시 운용·실행 측면에서 왜 근본적으로 어려운지 설명합니다.

https://x.com/cerebras/status/2023789540184584427

#modelscaling #moe #cerebras #kimik2

Cerebras (@cerebras) on X

In 2018, BERT-Large had 340 million parameters. Today, frontier models like Kimi-k2 exceed one trillion parameters, a 3,000x increase. In this conversation, @dmsobol, Head Research Scientist at Cerebras, explains why bigger models (especially MoEs) are fundamentally hard to run

X (formerly Twitter)

fly51fly (@fly51fly)

Tsinghua 및 Stanford 공동연구진의 논문 'Configuration-to-Performance Scaling Law with Neural Ansatz' 공개. 본 연구는 'Neural Ansatz'라는 해석적 가정을 도입해 모델 구성(configuration)과 성능 간의 스케일링 법칙을 제시·정량화하며, 대형 모델 설계와 성능 예측에 영향을 줄 수 있는 이론적 결과를 담고 있다(저자 H. Zhang, K. Wen, T. Ma, arXiv 2026).

https://x.com/fly51fly/status/2022064878392160722

#scalinglaw #neuralansatz #modelscaling #research

fly51fly (@fly51fly) on X

[LG] Configuration-to-Performance Scaling Law with Neural Ansatz H Zhang, K Wen, T Ma [Tsinghua University & Stanford University] (2026) https://t.co/jwaeLl05vj

X (formerly Twitter)
How to Think About GPUs | How To Scale Your Model

We love TPUs at Google, but GPUs are great too. This chapter takes a deep dive into the world of GPUs – how each chip works, how they’re networked together, and what that means for LLMs, especially compared to TPUs. While there are a multitude of GPU architectures from NVIDIA, AMD, Intel, and others, here we will focus on NVIDIA GPUs. This section builds on <a href='https://jax-ml.github.io/scaling-book/tpus/'>Chapter 2</a> and <a href='https://jax-ml.github.io/scaling-book/training'>Chapter 5</a>, so you are encouraged to read them first.

https://www.youtube.com/watch?v=XGu6ejtRz-0 interesting discussion about "value emergence", i'd also discuss something akin to "epistemic exhaustion", i'll guess i'll have to write that paper i've been postponing #aisafety #socialvalues #modelscaling good video @daveshapi
AI Will Resist Human Control — And That Could Be Exactly What We Need

YouTube