Mastodawn

How to Scale Your Model

이 글은 TPU와 GPU 하드웨어에서 대규모 언어 모델(LLM)을 효율적으로 확장하는 방법을 체계적으로 설명하는 책의 소개입니다. 모델 병렬화 기법, 하드웨어 병목 현상, 연산 및 통신 비용 분석, Transformer 아키텍처의 세부 연산량 계산 등을 다루며, LLaMA 3 모델을 활용한 실습과 JAX 기반 프로파일링 방법도 포함합니다. AI 연구자와 엔지니어가 대규모 모델을 하드웨어 한계 내에서 최적화하고 강력한 확장성을 달성하는 데 실질적인 도움을 주는 내용을 담고 있습니다.

https://jax-ml.github.io/scaling-book/

#llm #tpu #gpu #modelscaling #parallelism

How To Scale Your Model

Training LLMs often feels like alchemy, but understanding and optimizing the performance of your models doesn't have to. This book aims to demystify the science of scaling language models: how TPUs (and GPUs) work and how they communicate with each other, how LLMs run on real hardware, and how to parallelize your models during training and inference so they run efficiently at massive scale. If you've ever wondered “how expensive should this LLM be to train” or “how much memory do I need to serve this model myself” or “what's an AllGather”, we hope this will be useful to you.

sayzard Mar 5

Rohan Paul (@rohanpaul_ai)

Yann LeCun이 LLM의 현실 세계 지능 한계를 설명하며, 최대 규모의 LLM이 약 30조 단어(대략 10^14 바이트)의 텍스트로 학습되었다고 지적. 이 수치가 언뜻 커 보이지만 실제 세계에서 사물을 배우는 인간(예: 4세 아동)의 경험량과 비교하면 한계가 드러난다고 설명함.

https://x.com/rohanpaul_ai/status/2029377041653694821

#yannlecun #llm #airesearch #modelscaling #limitations

Rohan Paul (@rohanpaul_ai) on X

Yann LeCun (@ylecun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours

X (formerly Twitter)

sayzard Mar 3

AISatoshi (@AiXsatoshi)

Qwen3.5-2B-Q4는 대부분 성공하지 못해 2B와 4B 모델 사이의 성능 경계가 뚜렷하다고 평가. 작성자는 이 지점이 현 시점에서의 한계일 수 있다고 관찰함.

https://x.com/AiXsatoshi/status/2028501316855873707

#qwen #llm #modelscaling #quantization

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

Qwen3.5-2B-Q4では、ほとんど成功しない。2Bと4Bの能力境界は明瞭。ここら辺が現状の限界か

X (formerly Twitter)

sayzard Feb 17

Cerebras (@cerebras)

2018년 BERT-Large(3.4억 파라미터)와 비교해 오늘날 Kimi-k2 같은 최첨단 모델은 1조 파라미터를 넘겨 약 3,000배 성장했습니다. Cerebras의 연구 책임자 @dmsobol이 특히 MoE(혼합 전문가) 아키텍처가 규모 확대 시 운용·실행 측면에서 왜 근본적으로 어려운지 설명합니다.

https://x.com/cerebras/status/2023789540184584427

#modelscaling #moe #cerebras #kimik2

Cerebras (@cerebras) on X

In 2018, BERT-Large had 340 million parameters. Today, frontier models like Kimi-k2 exceed one trillion parameters, a 3,000x increase. In this conversation, @dmsobol, Head Research Scientist at Cerebras, explains why bigger models (especially MoEs) are fundamentally hard to run

X (formerly Twitter)

sayzard Feb 13

fly51fly (@fly51fly)

Tsinghua 및 Stanford 공동연구진의 논문 'Configuration-to-Performance Scaling Law with Neural Ansatz' 공개. 본 연구는 'Neural Ansatz'라는 해석적 가정을 도입해 모델 구성(configuration)과 성능 간의 스케일링 법칙을 제시·정량화하며, 대형 모델 설계와 성능 예측에 영향을 줄 수 있는 이론적 결과를 담고 있다(저자 H. Zhang, K. Wen, T. Ma, arXiv 2026).

https://x.com/fly51fly/status/2022064878392160722

#scalinglaw #neuralansatz #modelscaling #research

fly51fly (@fly51fly) on X

[LG] Configuration-to-Performance Scaling Law with Neural Ansatz H Zhang, K Wen, T Ma [Tsinghua University & Stanford University] (2026) https://t.co/jwaeLl05vj

X (formerly Twitter)

Hacker News Aug 20, 2025

How to Scale Your Model: How to Think About GPUs

https://jax-ml.github.io/scaling-book/gpus/

#HackerNews #How #to #Scale #Your #Model: #How #to #Think #About #GPUs #ModelScaling #GPUs #DeepLearning #MachineLearning #AI

How to Think About GPUs | How To Scale Your Model

We love TPUs at Google, but GPUs are great too. This chapter takes a deep dive into the world of GPUs – how each chip works, how they’re networked together, and what that means for LLMs, especially compared to TPUs. While there are a multitude of GPU architectures from NVIDIA, AMD, Intel, and others, here we will focus on NVIDIA GPUs. This section builds on <a href='https://jax-ml.github.io/scaling-book/tpus/'>Chapter 2</a> and <a href='https://jax-ml.github.io/scaling-book/training'>Chapter 5</a>, so you are encouraged to read them first.

guIA - guía a la IA Feb 13, 2025

https://www.youtube.com/watch?v=XGu6ejtRz-0 interesting discussion about "value emergence", i'd also discuss something akin to "epistemic exhaustion", i'll guess i'll have to write that paper i've been postponing #aisafety #socialvalues #modelscaling good video @daveshapi

AI Will Resist Human Control — And That Could Be Exactly What We Need

YouTube