57 Followers
216 Following
216 Posts
AIME develops AI-machines for deep learning: Multi-GPU workstations & HPC servers. We also provide GPU cloud compute, deep learning services & consulting.
Websitehttps://www.aime.info
LinkedInhttps://www.linkedin.com/company/a-i-m-e/
Bloghttps://www.aime.info/blog/
LocationBerlin
Hume AI just dropped TADA (Text-Acoustic Dual Alignment) – an open-source framework for expressive speech generation.
✨ 1:1 text-audio token alignment
✨ Precise prosody & timing control
✨ Multilingual (DE/EN/FR/ES/JA/AR)
✨ Built on Llama 3.2 (1B/3B)
🔗 https://github.com/HumeAI/tada

Instant LLM adaptation via text prompts? 🧠⚡️

SakanaAI's new Text-to-LoRA (T2L) uses a hypernetwork to generate task-specific LoRAs from simple text descriptions—no expensive fine-tuning required.

✅ Compresses 100s of adapters
✅ Generalizes to unseen tasks
✅ ICML 2025 Paper & Code: https://github.com/SakanaAI/text-to-lora

GitHub - SakanaAI/text-to-lora: Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input

Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input - SakanaAI/text-to-lora

GitHub
Voicebox: open-source, locally run TTS studio—no cloud, no subscriptions.
✅ Powered by Qwen3-TTS for expressive voice cloning
✅ Multi-track editor + inline audio editing
✅ Tauri/Rust app: 10× smaller than Electron
✅ MIT license, full privacy
A self-hosted alternative to ElevenLabs 👇
https://github.com/jamiepine/voicebox
GitHub - jamiepine/voicebox: The open-source voice synthesis studio powered by Qwen3-TTS.

The open-source voice synthesis studio powered by Qwen3-TTS. - jamiepine/voicebox

GitHub

AKI.IO is now live: Curated open-source and open-weight models such as #MiniMax M2.5, #Apertus 70B, #Qwen Image Edit and many more as an API – hosted entirely in European data centers w/o hyperscalers. Happy to get your feedback!
The playground is open, API key via free registration at aki.io

https://www.aki.io/

Home - AKI.IO

Token-based access to leading open-source AI models on EU infrastructure. Evaluate, build and scale your AI product without self-hosting or vendor lock-in.

AKI.IO

DeepSeek OCR 2 is a 3B VLM that reads documents like humans do. "Visual Causal Flow" dynamically reorders tokens by semantic meaning, not left-to-right, unlocking 91.09% accuracy on complex layouts.
Invoice parsing • contract analysis • archival digitization • form extraction
Fully open source (Apache 2.0).

https://github.com/deepseek-ai/DeepSeek-OCR-2

GitHub - deepseek-ai/DeepSeek-OCR-2: Visual Causal Flow

Visual Causal Flow. Contribute to deepseek-ai/DeepSeek-OCR-2 development by creating an account on GitHub.

GitHub
Kimi released K2.5 — a native multimodal model trained on 15T visual-text tokens that generates full interactive UIs from prompts and orchestrates 100-agent swarms for complex tasks. 4.5× faster execution, 59% productivity boost. Open weights available now.
https://www.kimi.com/blog/kimi-k2-5.html
Kimi K2.5 Tech Blog: Visual Agentic Intelligence

Kimi K2.5 defines Visual Agentic Intelligence. Trained on 15T tokens, it introduces SOTA visual coding and autonomous agent swarm. Read the full tech blog.

Alibaba released Qwen3-TTS, a new text-to-speech model with discrete multi-codebook LM architecture under Apache license. Features 97ms synthesis latency, 3-second voice cloning, and 10-language support including German. Available on Hugging Face and ModelScope.

https://github.com/QwenLM/Qwen3-TTS

GitHub - QwenLM/Qwen3-TTS: Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice...

GitHub

NVIDIA just dropped PersonaPlex - a speech-to-speech model that lets you control AI personas through text prompts AND voice conditioning! 🎙️✨

🔥 Real-time, full-duplex conversations with consistent character
🔥 Natural latency + multiple voice embeddings (NAT/VAR)
🔥 Perfect for customer service, assistants & immersive experiences

https://github.com/NVIDIA/personaplex

GitHub - NVIDIA/personaplex: PersonaPlex code.

PersonaPlex code. Contribute to NVIDIA/personaplex development by creating an account on GitHub.

GitHub

Z.AI just released GLM-4.7-Flash - a 30B-A3B MoE model that dominates the 30B parameter class!

🔥 Key benchmarks:
✅ 91.6% on AIME 25 (beats GPT-OSS-20B)
✅ 75.2% on GPQA
✅ 59.2% on SWE-bench Verified (3x better than Qwen3!)

Perfect balance of power & efficiency for enterprise deployment. Supports vLLM, SGLang & native tool integration.

https://huggingface.co/zai-org/GLM-4.7-Flash

zai-org/GLM-4.7-Flash · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Z.AI released GLM-Image, an innovative image generation model that establishes new benchmarks in specific application areas through its hybrid architecture.

https://github.com/zai-org/GLM-Image

GitHub - zai-org/GLM-Image: GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.

GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation. - zai-org/GLM-Image

GitHub