Mastodawn

Hume AI just dropped TADA (Text-Acoustic Dual Alignment) – an open-source framework for expressive speech generation.
✨ 1:1 text-audio token alignment
✨ Precise prosody & timing control
✨ Multilingual (DE/EN/FR/ES/JA/AR)
✨ Built on Llama 3.2 (1B/3B)
🔗 https://github.com/HumeAI/tada

AIME Mar 10

AIME Mar 6

Instant LLM adaptation via text prompts? 🧠⚡️

SakanaAI's new Text-to-LoRA (T2L) uses a hypernetwork to generate task-specific LoRAs from simple text descriptions—no expensive fine-tuning required.

✅ Compresses 100s of adapters
✅ Generalizes to unseen tasks
✅ ICML 2025 Paper & Code: https://github.com/SakanaAI/text-to-lora

GitHub - SakanaAI/text-to-lora: Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input

Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input - SakanaAI/text-to-lora

GitHub

AIME Mar 10

AIME Feb 20

Voicebox: open-source, locally run TTS studio—no cloud, no subscriptions.
✅ Powered by Qwen3-TTS for expressive voice cloning
✅ Multi-track editor + inline audio editing
✅ Tauri/Rust app: 10× smaller than Electron
✅ MIT license, full privacy
A self-hosted alternative to ElevenLabs 👇
https://github.com/jamiepine/voicebox

GitHub - jamiepine/voicebox: The open-source voice synthesis studio powered by Qwen3-TTS.

The open-source voice synthesis studio powered by Qwen3-TTS. - jamiepine/voicebox

GitHub

AIME Feb 18

AKI.IO is now live: Curated open-source and open-weight models such as #MiniMax M2.5, #Apertus 70B, #Qwen Image Edit and many more as an API – hosted entirely in European data centers w/o hyperscalers. Happy to get your feedback!
The playground is open, API key via free registration at aki.io

https://www.aki.io/

Home - AKI.IO

Token-based access to leading open-source AI models on EU infrastructure. Evaluate, build and scale your AI product without self-hosting or vendor lock-in.

AKI.IO

AIME Jan 28

DeepSeek OCR 2 is a 3B VLM that reads documents like humans do. "Visual Causal Flow" dynamically reorders tokens by semantic meaning, not left-to-right, unlocking 91.09% accuracy on complex layouts.
Invoice parsing • contract analysis • archival digitization • form extraction
Fully open source (Apache 2.0).

https://github.com/deepseek-ai/DeepSeek-OCR-2

GitHub - deepseek-ai/DeepSeek-OCR-2: Visual Causal Flow

Visual Causal Flow. Contribute to deepseek-ai/DeepSeek-OCR-2 development by creating an account on GitHub.

GitHub

AIME Jan 28

AIME Jan 27

Kimi released K2.5 — a native multimodal model trained on 15T visual-text tokens that generates full interactive UIs from prompts and orchestrates 100-agent swarms for complex tasks. 4.5× faster execution, 59% productivity boost. Open weights available now.
https://www.kimi.com/blog/kimi-k2-5.html

Kimi K2.5 Tech Blog: Visual Agentic Intelligence

Kimi K2.5 defines Visual Agentic Intelligence. Trained on 15T tokens, it introduces SOTA visual coding and autonomous agent swarm. Read the full tech blog.

AIME Jan 28

AIME Jan 24

Alibaba released Qwen3-TTS, a new text-to-speech model with discrete multi-codebook LM architecture under Apache license. Features 97ms synthesis latency, 3-second voice cloning, and 10-language support including German. Available on Hugging Face and ModelScope.

https://github.com/QwenLM/Qwen3-TTS

GitHub - QwenLM/Qwen3-TTS: Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice...

GitHub

AIME Jan 22

AIME Jan 21

NVIDIA just dropped PersonaPlex - a speech-to-speech model that lets you control AI personas through text prompts AND voice conditioning! 🎙️✨

🔥 Real-time, full-duplex conversations with consistent character
🔥 Natural latency + multiple voice embeddings (NAT/VAR)
🔥 Perfect for customer service, assistants & immersive experiences

https://github.com/NVIDIA/personaplex

GitHub - NVIDIA/personaplex: PersonaPlex code.

PersonaPlex code. Contribute to NVIDIA/personaplex development by creating an account on GitHub.

GitHub

AIME Jan 22

AIME Jan 20

Z.AI just released GLM-4.7-Flash - a 30B-A3B MoE model that dominates the 30B parameter class!

🔥 Key benchmarks:
✅ 91.6% on AIME 25 (beats GPT-OSS-20B)
✅ 75.2% on GPQA
✅ 59.2% on SWE-bench Verified (3x better than Qwen3!)

Perfect balance of power & efficiency for enterprise deployment. Supports vLLM, SGLang & native tool integration.

https://huggingface.co/zai-org/GLM-4.7-Flash

zai-org/GLM-4.7-Flash · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

AIME Jan 17

AIME Jan 14

Z.AI released GLM-Image, an innovative image generation model that establishes new benchmarks in specific application areas through its hybrid architecture.

https://github.com/zai-org/GLM-Image

GitHub - zai-org/GLM-Image: GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.

GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation. - zai-org/GLM-Image

GitHub

Website	https://www.aime.info
LinkedIn	https://www.linkedin.com/company/a-i-m-e/
Blog	https://www.aime.info/blog/
Location	Berlin