Nemotron 3 Super pushes the frontier with 40 M supervised & alignment samples, leveraging a Mamba‑Transformer backbone and Mixture‑of‑Experts scaling. The model shows stronger agent reasoning, RL‑based fine‑tuning, and tighter AI alignment. Dive into the details to see how this LLM reshapes open‑source AI. #Nemotron3 #MixtureOfExperts #AIAlignment #SupervisedFineTuning
🔗 https://aidailypost.com/news/nemotron-3-super-incorporates-40-million-supervised-alignment-samples
fly51fly (@fly51fly)
LG 소속의 G. Gülmez(2026) 연구 'DynaMoE'는 토큰 단위로 전문가(Expert)를 동적으로 활성화하고 레이어별 적응형 용량(layer-wise adaptive capacity)을 적용해 Mixture-of-Experts(MoE) 신경망의 효율성·자원 할당을 개선하는 방법을 제안한다. arXiv에 공개된 논문으로 MoE의 활성화 패턴과 계산 비용 최적화에 중점.
https://x.com/fly51fly/status/2030759672681279967
#moe #mixtureofexperts #dynamoe #arxiv #deeplearning

fly51fly (@fly51fly) on X
[LG] DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks
G Gülmez (2026)
https://t.co/zEc7F82uSw
X (formerly Twitter)Gökdeniz Gülmez (@ActuallyIsaak)
새 연구 논문 발표: DynaMoE라는 Mixture-of-Experts 프레임워크를 소개합니다. DynaMoE는 토큰별로 활성화되는 전문가(expert)의 수를 동적으로 결정하고, 전체 전문가 수를 상황에 따라 스케줄링할 수 있는 구조를 제안합니다. MoE 계열의 효율성·확장성 개선을 목표로 한 아키텍처 연구입니다.
https://x.com/ActuallyIsaak/status/2028770014883418222
#dynamoe #mixtureofexperts #moe #research

Gökdeniz Gülmez (@ActuallyIsaak) on X
Today I’m sharing a new research paper that explores a new idea in mixture of experts architecture called “DynaMoE”.
DynaMoE is a Mixture-of-Experts framework where:
- the number of active experts per token is dynamic.
- the number of all experts can be scheduled differently
X (formerly Twitter)Alibaba just released the Qwen‑3.5‑Medium model as open‑source, delivering Sonnet 4.5‑level performance on a single GPU. It uses a Mixture‑of‑Experts architecture and a new “Thinking Mode” to boost AI inference efficiency while staying lightweight. Dive into the details and see how this could reshape open‑source LLM development. #Qwen3_5 #OpenSourceLLM #MixtureOfExperts #ModelEfficiency
🔗 https://aidailypost.com/news/alibaba-open-sources-qwen35-medium-models-sonnet-45-performance
Ivan Fioravanti ᯅ (@ivanfioravanti)
기사(Article)가 다시 온라인에 복구되었으며, 두 종류의 MoE(Mixture of Experts) 모두에서 훨씬 빠른 결과를 보고하고 있습니다. MoE 관련 성능 향상 및 속도 개선에 대한 업데이트로 보이며, 모델 아키텍처나 추론 최적화와 관련된 중요한 성능 지표 변경일 가능성이 큽니다.
https://x.com/ivanfioravanti/status/2026744295043002668
#moe #mixtureofexperts #ml #performance

Ivan Fioravanti ᯅ (@ivanfioravanti) on X
Article is back online with much faster results for both MoE. 🚀
X (formerly Twitter)NVIDIA’s new co‑design with Sarvam AI slashes time‑to‑first‑token to under a second for LLM inference. By marrying Mixture‑of‑Experts models with GPU acceleration, they boost throughput while trimming latency. This hardware‑software synergy could reshape how we deploy large language models at scale. Read more to see the numbers and tech behind the breakthrough. #NVIDIA #SarvamAI #MixtureOfExperts #TTFT
🔗 https://aidailypost.com/news/nvidia-co-design-boosts-sarvam-ai-inference-cuts-ttft-below-one-second
Alibaba's new Qwen 3.5 397B-A17 outperforms even larger rivals by using multi-token prediction and a sparse mixture-of-experts architecture. It cuts inference cost while keeping top-tier performance, hinting at a new era for multimodal AI. Curious how 397 billion parameters can be cheaper? Read the full story. #Qwen3_5 #AlibabaAI #MixtureOfExperts #MultiTokenPrediction
🔗 https://aidailypost.com/news/alibabas-qwen-35-397b-a17-beats-larger-model-via-multitoken
MiniMax's new M2.5 model slashes costs to 1/20 of Claude Opus while handling 30% of HQ tasks. Built on a Mixture‑of‑Experts sparse architecture, it delivers strong code‑generation and LLM performance—all open‑source. Discover how this AI agent could boost productivity in your projects. #MiniMaxM2_5 #MixtureOfExperts #OpenSourceAI #AIProductivity
🔗 https://aidailypost.com/news/minimaxs-m25-costs-120-claude-opus-covers-30-hq-tasks
Rohan Paul (@rohanpaul_ai)
Ant Open Source가 LLaDA2.1 Flash를 공개했습니다. 100B 파라미터 규모의 언어 diffusion MoE(혼합 전문가) 모델로, 최대 892 토큰/초의 추론 속도를 기록해 Qwen3-30B-A3B보다 2.5배 빠른 성능을 냈다고 보고되었습니다. 높은 실시간 추론 성능을 강조한 릴리스입니다.
https://x.com/rohanpaul_ai/status/2021643743313756658
#llm #inferencespeed #mixtureofexperts #antopensource #modelperformance

Rohan Paul (@rohanpaul_ai) on X
Ant Open Source just dropped LLaDA2.1 Flash.
Insane inference speed for a 100B param language diffusion MoE model.
Achieved a peak speed of 892 tokens per second beating the much smaller Qwen3-30B-A3B by 2.5x.
The reason it could achieve this incredible speed is because it
X (formerly Twitter)