UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

UniPool은 기존 Mixture-of-Experts(MoE) 아키텍처의 각 층별 독립 전문가 집합 방식을 전역 공유 전문가 풀로 대체한 새로운 MoE 구조입니다. 이를 통해 전문가 파라미터가 층 깊이에 선형적으로 증가할 필요 없이, 공유 풀 내에서 효율적이고 안정적인 라우팅과 균형 잡힌 전문가 활용을 가능하게 합니다. LLaMA 기반 다양한 모델 크기에서 UniPool은 기존 MoE 대비 검증 손실과 혼란도를 일관되게 개선하며, 전문가 파라미터 예산을 줄이면서도 성능을 유지하거나 향상시켰습니다. 이 연구는 MoE의 깊이 확장과 전문가 파라미터 할당에 대한 새로운 설계 방향을 제시합니다.

https://arxiv.org/abs/2605.06665

#mixtureofexperts #moe #llama #modelarchitecture #routing

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, recent analyses and our routing probe challenge this allocation rule: replacing a deeper layer's learned top-k router with uniform random routing drops downstream accuracy by only 1.0-1.6 points across multiple production MoE models. Motivated by this redundancy, we propose UniPool, an MoE architecture that treats expert capacity as a global architectural budget by replacing per-layer expert ownership with a single shared pool accessed by independent per-layer routers. To enable stable and balanced training under sharing, we introduce a pool-level auxiliary loss that balances expert utilization across the entire pool, and adopt NormRouter to provide sparse and scale-stable routing into the shared expert pool. Across five LLaMA-architecture model scales (182M, 469M, 650M, 830M, and 978M parameters) trained on 30B tokens from the Pile, UniPool consistently improves validation loss and perplexity over the matched vanilla MoE baselines. Across these scales, UniPool reduces validation loss by up to 0.0386 relative to vanilla MoE. Beyond raw loss improvement, our results identify pool size as an explicit depth-scaling hyperparameter: reduced-pool UniPool variants using only 41.6%-66.7% of the vanilla expert-parameter budget match or outperform layer-wise MoE at the tested scales. This shows that, under a shared-pool design, expert parameters need not grow linearly with depth; they can grow sublinearly while remaining more efficient and effective than vanilla MoE. Further analysis shows that UniPool's benefits compose with finer-grained expert decomposition.

arXiv.org

Sebastian Raschka (@rasbt)

새로운 LLM Architecture Gallery를 정리해 공개했습니다. 다양한 대형언어모델(LLM)의 아키텍처 그림을 한곳에 모아 비교·참고하기 쉽게 만들었으며, 관련 페이지(https://sebastianraschka.com/llm-architecture-gallery/) 링크를 제공하여 개발자와 연구자가 구조를 빠르게 확인할 수 있습니다.

https://x.com/rasbt/status/2033167146302210058

#llm #modelarchitecture #gallery #ai

LLM Architecture Gallery

A gallery that collects architecture figures from The Big LLM Architecture Comparison and related articles, with fact sheets and links back to the original sections.

Sebastian Raschka, PhD

Tencent HY (@TencentHunyuan)

정적 모델로는 충분하지 않다며 최신 연구 'Functional Neural Memory'를 공개했습니다. 이 접근법은 각 입력마다 맞춤형 파라미터를 생성해 모델을 즉시 프롬프트로 제어하고 즉각적 개인화, 개선된 지시 이행을 가능하게 하며 유연한 모델 동작을 목표로 합니다.

https://x.com/TencentHunyuan/status/2029644529578692723

#functionalneuralmemory #modelarchitecture #personalization #parametergeneration

Tencent HY (@TencentHunyuan) on X

One static model does not fit all😭 We just dropped our latest work: Functional Neural Memory. Instead of static models, we generate custom "parameters" for every single input. ✅Prompt your model anytime ✅Instant personalization ✅Better instruction following ✅Flexible &

X (formerly Twitter)

Emily (@IamEmily2050)

Grok 에이전트 아키텍처의 채택 방식에 대해 설명하는 글로, 대규모 단일 모델을 수개월에 걸쳐 훈련·사후처리하는 방식 대신 소형·효율적인 모델들을 사용해 수주 내에 개선 가능한 구조라고 주장합니다. 향후 몇 달 내 광범위한 채택을 예측하고 있습니다.

https://x.com/IamEmily2050/status/2023948296616575017

#grok #agents #modelarchitecture #efficientmodels #llms

Emily (@IamEmily2050) on X

It seems many people still don't understand how the new Grok agents architecture will be adopted by everyone in the coming months. Instead of one big model taking six months to finish training and post training, it uses small, efficient models that are easy to improve in weeks,

X (formerly Twitter)

Aakash Harish (@0_Aakash_0)

작성자는 Spark가 전체 Codex 모델을 대체하는 것이 아니라 '속도 계층(speed layer)'이라고 주장. 합리적인 패턴으로는 더 똑똑한 모델(Codex 또는 Opus)으로 계획을 수립하고, Spark를 고속·작업용으로 사용하는 분업적 접근을 제안함 — 모델 조합 전략에 관한 실무적 통찰.

https://x.com/0_Aakash_0/status/2022180694517301737

#spark #codex #opus #modelarchitecture

Aakash Harish (@0_Aakash_0) on X

@daniel_mac8 Your last point is the key insight here and I think it's actually the right mental model for the entire Spark lineup. Spark isn't a replacement for the full Codex model. It's a speed layer. The pattern that makes sense: 1. Use the smarter model (Codex or Opus) to plan

X (formerly Twitter)

New model architecture: Xiaomi MiMo (MiMo-V2-Flash)

MiMo explores multi-token prediction (MTP) to increase inference throughput by generating and verifying multiple draft tokens in parallel. By keeping the MTP block lightweight, it achieves significant speedups without increasing KV-cache overhead—pointing to architectural innovation beyond pure scaling.

#LLMs #ModelArchitecture #AIResearch
https://mimo.xiaomi.com/blog/mimo-v2-flash

Xiaomi MiMo

xiaomi mimo

Alex L. Zhang presents the Kolmogorov-Arnold Network with all the flair of an overcaffeinated grad student's diary 📚😴. It's a riveting tale of model architecture and training that no one asked for, complete with parts I, II, and III for your reading pleasure—if you've run out of paint to watch dry. 🎨🕰️
https://alexzhang13.github.io/blog/2024/annotated-kan/ #KolmogorovArnoldNetwork #GradStudentLife #ModelArchitecture #TrainingTales #OvercaffeinatedDiaries #HackerNews #ngated
Alex L. Zhang | The Annotated Kolmogorov-Arnold Network (KAN)

An annotated guide to the Kolmogorov-Arnold Network

Alex L. Zhang