🍹 Introducing Liquid AI's latest cocktail: an "Even Better onDevice MixtureofExperts" with a splash of #buzzwords and a twist of jargon 🍹 Just what you needed! Now you can "unlock" your business potential by drowning it in a sea of acronyms and overpriced "solutions" that claim to be the world's "most efficient" 🙄 Cheers to that! 🎉
https://www.liquid.ai/blog/lfm2-5-8b-a1b #LiquidAI #Cocktail #BusinessSolutions #MixtureofExperts #OverpricedSolutions #HackerNews #ngated
LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts | Liquid AI

Today, we’re releasing LFM2.5-8B-A1B, a high-throughput edge model optimized for fast, reliable tool calling and complex instruction following on consumer hardware, delivering compressed performance competitive with much larger models and day-one support across major inference frameworks.

Some cool work from allen institute where they trained a more understandable mixture of experts model. The content that the experts are experts at are more clustered and better resemble human topics. A big benefit of this is it looks like you can remove an expert from the model and performance degrades much more gracefully than a standard MoE.

https://bsky.app/profile/ai2.bsky.social/post/3mle56nehfz2w

#llm #mixtureofexperts #ai2

Ai2 (@ai2.bsky.social)

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

Bluesky Social

Gemma 4: Google's Most Capable Open Models Are Here — and They Run on Your Laptop

https://techlife.blog/posts/gemma-4-google-open-models

#Gemma4 #GoogleAI #OpenSourceAI #LLM #OnDeviceAI #MixtureOfExperts

Gemma 4: Google's Most Capable Open Models Are Here — and They Run on Your Laptop

Google's Gemma 4 family brings frontier-level AI reasoning to devices ranging from Android phones to developer workstations, under a fully open Apache 2.0 license.

TechLife — AI, Software Engineering & Emerging Technology

Nemotron 3 Super pushes the frontier with 40 M supervised & alignment samples, leveraging a Mamba‑Transformer backbone and Mixture‑of‑Experts scaling. The model shows stronger agent reasoning, RL‑based fine‑tuning, and tighter AI alignment. Dive into the details to see how this LLM reshapes open‑source AI. #Nemotron3 #MixtureOfExperts #AIAlignment #SupervisedFineTuning

🔗 https://aidailypost.com/news/nemotron-3-super-incorporates-40-million-supervised-alignment-samples

Alibaba just released the Qwen‑3.5‑Medium model as open‑source, delivering Sonnet 4.5‑level performance on a single GPU. It uses a Mixture‑of‑Experts architecture and a new “Thinking Mode” to boost AI inference efficiency while staying lightweight. Dive into the details and see how this could reshape open‑source LLM development. #Qwen3_5 #OpenSourceLLM #MixtureOfExperts #ModelEfficiency

🔗 https://aidailypost.com/news/alibaba-open-sources-qwen35-medium-models-sonnet-45-performance

NVIDIA’s new co‑design with Sarvam AI slashes time‑to‑first‑token to under a second for LLM inference. By marrying Mixture‑of‑Experts models with GPU acceleration, they boost throughput while trimming latency. This hardware‑software synergy could reshape how we deploy large language models at scale. Read more to see the numbers and tech behind the breakthrough. #NVIDIA #SarvamAI #MixtureOfExperts #TTFT

🔗 https://aidailypost.com/news/nvidia-co-design-boosts-sarvam-ai-inference-cuts-ttft-below-one-second

Alibaba's new Qwen 3.5 397B-A17 outperforms even larger rivals by using multi-token prediction and a sparse mixture-of-experts architecture. It cuts inference cost while keeping top-tier performance, hinting at a new era for multimodal AI. Curious how 397 billion parameters can be cheaper? Read the full story. #Qwen3_5 #AlibabaAI #MixtureOfExperts #MultiTokenPrediction

🔗 https://aidailypost.com/news/alibabas-qwen-35-397b-a17-beats-larger-model-via-multitoken