bstn (@bstnxbt)

DFlash v0.1.4가 공개되었고, 양자화된 Qwen3 하이브리드 모델용 custom Metal verify kernel과 장문 컨텍스트에서의 피크 메모리 감소를 제공한다. M5 Max 환경에서 mlx_lm 기준 대비 토큰 처리 속도가 크게 향상되어, Apple 실리콘과 MLX 계열 추론 최적화에 중요한 개선으로 보인다.

https://x.com/bstnxbt/status/2045591537661190209

#dflash #qwen3 #metal #quantization #mlx

bstn 👁️ (@bstnxbt) on X

DFlash v0.1.4 : custom Metal verify kernels for quantized Qwen3 hybrid models, plus significant peak memory reduction at long context. M5 Max 40-core GPU, 64GB, stock mlx_lm baseline: Qwen3.6-35B-A3B-4bit: ► @ 1024 · 138.3 → 300.3 tok/s (2.20x) ► @ 2048 · 135.6 → 246.4

X (formerly Twitter)
Ich sitze seit 7h an einem kack UML Diagramm mit #plantuml. Die Kollegin braucht das morgen.
Da ich keine #cloud basierte #ai nutzen darf, wegen unsere Geschäftsgeheimnisse, nutze ich entweder unsere genehmigte #chatgpt Enterprise Version von der Firma oder #gemma4 / #qwen3 mit #lmstudio.
Momentan hat sie die nicht ganz perfekte Variante, bei der man noch von einem "Schönheitsfehler" reden kann.

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

https://simonwillison.net/2026/Apr/16/qwen-beats-opus/

#HackerNews #Qwen3.6 #A3B #ClaudeOpus #pelican #AIart

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

For anyone who has been (inadvisably) taking my pelican riding a bicycle benchmark seriously as a robust way to test models, here are pelicans from this morning’s two big model …

Simon Willison’s Weblog

Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All

https://qwen.ai/blog?id=qwen3.6-35b-a3b

#HackerNews #Qwen3.6 #AgenticCoding #OpenAI #TechNews

Qwen Studio

Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

Projekt Babel Brief (The Chatter Digest)
Der Babbel-Brief ist mein persönlicher KI-Sekretär für Telegram. Er liest automatisch definierte Chats und Kanäle aus, filtert das „Gebabbel“ heraus und liefert täglich eine prägnante Zusammenfassung der wichtigsten Themen. Ihr braucht dazu einen Telegram API Key, Ollama Endpoint , und eine Python Umgebung.

https://github.com/o-valo/Babbel-Brief

LG Olav

#ki #ai #automation #telegram #messanger #python #linux #ubuntu #server #github #ollama #llm #qwen3

GitHub - o-valo/Babbel-Brief: [ENG] (The Chatter Digest) Vibe: Professional, automated, insightful. [DE] Was ist der Babbel-Brief? Der Babbel-Brief ist mein persönlicher KI-Sekretär für Telegram. Er liest automatisch definierte Chats und Kanäle aus, filtert das „Gebabbel“ heraus und liefert täglich eine prägnante Zusammenfassung der wichtigsten Themen.

[ENG] (The Chatter Digest) Vibe: Professional, automated, insightful. [DE] Was ist der Babbel-Brief? Der Babbel-Brief ist mein persönlicher KI-Sekretär für Telegram. Er liest automatisch definier...

GitHub

Artificial Analysis (@ArtificialAnlys)

Artificial Analysis에서 Gemma 4, Qwen3.5 등 여러 AI 모델을 비교할 수 있는 모델 비교 페이지를 소개했다. 최신 오픈 모델들의 성능을 한곳에서 확인하고 벤치마크 비교에 활용할 수 있는 유용한 리소스다.

https://x.com/ArtificialAnlys/status/2043929887707451419

#artificialanalysis #gemma4 #qwen3.5 #benchmark #llm

Artificial Analysis (@ArtificialAnlys) on X

Compare Gemma 4, Qwen3.5, and other models at https://t.co/PQCRupCPta

X (formerly Twitter)

New week, new update for the slides of my talk "Run LLMs Locally":

Now including Gemma4 and Qwen3-Omni with Vision and Audio support and new slides describing Llama.cpp server parameters.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #stablediffusion #gptoss #qwen3 #glm #localai #gemma4

Voicebox: #OpenSource-Alternative zu #ElevenLabs auf Basis von #Qwen3-TTS • Voice Cloning und Text-to-Speech #KI 🔊🤖✨ https://tchgdns.de/?p=162038
levelup.gitconnected.com/i-tested-the... #PrismML 8.2 billion parameters in 1.15 GB competes with #Llama3.1, #Qwen3, and #Gemma4 FP16 models in16 GB. PrismML’s Bonsai 8B is 14x smaller. On iPhone 17 Pro Max, it clocks 44 tokens per second: real-time conversation speed on a phone, no cloud required.

I Tested the 1-Bit LLM That Fi...

Omar Khattab (@lateinteraction)

@a1zhang의 새 블로그가 언어 모델의 미래를 다루며, RLM-Qwen3-4B에 대해 32k 토큰의 쉬운 장문맥 과제로 GRPO를 학습해도 1M 토큰, 8-needle 장문맥 작업으로 자동 일반화되고 100% 신뢰도로 동작한다는 결과가 핵심으로 소개됐다.

https://x.com/lateinteraction/status/2042668150185947627

#llm #grpo #longcontext #rl #qwen3

Omar Khattab (@lateinteraction) on X

New must-read blog by @a1zhang on the future of language models. Buried nugget: doing GRPO for RLM-Qwen3-4B on short (32k token) and easy (single-needle) MRCRv2 long-context tasks generalizes *automatically* and with perfect (100%) reliability to 1M-token, 8-needle tasks!!

X (formerly Twitter)