Mastodawn

Sebastian Raschka (@rasbt)

Mamba-3이 출시되었으며, 작성자는 Mamba 및 유사 모델들이 트랜스포머 어텐션 하이브리드 아키텍처(Qwen3.5, Kimi Linear 등)에서 흥미로운 활용처라고 평가합니다. 다음 세대 하이브리드에서 Gated DeltaNet 대신 RoPE가 추가된 Mamba-3을 교체해보는 실험을 제안하고 있습니다.

https://x.com/rasbt/status/2034088726997893168

#mamba3 #transformer #qwen3.5 #gateddeltanet #rope

Sebastian Raschka (@rasbt) on X

Oh wow, Mamba-3 is here! For me, the most interesting use case of Mamba and Mamba-likes are the recent transformer attention hybrid architectures (Qwen3.5, Kimi Linear, etc.) Would be interesting to swap Gated DeltaNet with Mamba-3 (which now also has RoPE) in next gen hybrids.

X (formerly Twitter)

sayzard 3d ago

Prince Canuma (@Prince_Canuma)

mlx-embeddings v0.1.0 출시: 새 모델로 Alibaba의 Qwen3 VL Embedding 및 Reranker와 ColDefics3(LoRA 어댑터 및 ColVision 프로세서 포함)이 추가되었습니다. NVFP4·MXFP4·MXFP8 양자화 지원이 도입되었고, Gemma3의 양방향 모델 임베딩 품질 수정이 포함됩니다.

https://x.com/Prince_Canuma/status/2032890809847029896

#mlxembeddings #embeddings #qwen3 #coldefics3 #quantization

Prince Canuma (@Prince_Canuma) on X

mlx-embeddings v0.1.0 is out! 🔥 New models: → Qwen3 VL Embedding and Reranker by @Alibaba_Qwen → ColDefics3 with LoRA adapters & ColVision processor b New features: → NVFP4, MXFP4 and MXFP8 quantization support → Gemma3 embedding quality fix for bidirectional models →

X (formerly Twitter)

Nicolas MØUART 3d ago

That's an excerpt of #Qwen3.5 35B summary on Robin Wright after I made it correct everything :
"Robin Wright is an acclaimed American actress, producer, and director. Born in 1966, she gained early fame as Princess Buttercup in The Princess Bride (1987) and earned a Golden Globe nomination for Forrest Gump (1994). She achieved global acclaim as Claire Underwood in the #Netflix series House of Cards (2013–2018), winning a Golden Globe ..""
Not a word on Santa Barbara #TV show 🤷‍♂️
#80s

Thomas 4d ago

New update for the slides of my talk "Run LLMs Locally":

Now including Reranking, Qwen 3.5 (slower than Qwen 3, but includes Vision) and loading models with Direct I/O.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai #mcp

UNWIRE.HK 5d ago

新舊MBP 本地LLM對決〡M2 Max vs M5 Max〡極限測試大型LLM
當 2026 年最強 Apple 流動平台 M5 Max MBP 遇上 Local 部署的 Qwen 3. […]
#unwire TV #AI #Apple #M2 Max
https://unwire.hk/2026/03/13/m5-max-macbook-pro/unwire_podcast/?utm_source=rss&utm_medium=rss&utm_campaign=m5-max-macbook-pro

sayzard 5d ago

AISatoshi (@AiXsatoshi)

Qwen3.5-27B에 Claude 4.6 Opus 스타일의 정리된 사고(organized thinking) 방식을 학습시킨 개선판에 관한 언급입니다. 즉 Qwen3.5-27B를 Claude 4.6 Opus풍으로 보완한 변형 모델이 소개된 내용입니다.

https://x.com/AiXsatoshi/status/2032129146181288298

#qwen #qwen3.5 #claude #llm #model

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

Qwen3.5-27BにClaude 4.6 Opus風の整理された思考法を学習させた改良版

X (formerly Twitter)

sayzard 6d ago

Sudo su (@sudoingX)

오픈소스로 공개된 Hermes Agent 소개 트윗. 31개 도구, 85개 스킬(파일 작업, 터미널, 브라우저, cron, 위임, 코드 실행 등)과 세션 간 지속 메모리 지원을 강조하며, 단일 RTX 3060에서 Qwen 3.5 9B를 50 tok/s로 구동해 동작한다고 설명합니다. '이것이 오픈소스가 배포될 때의 모습'이라 명시해 의미있는 오픈소스 성과로 포지셔닝하고 있습니다.

https://x.com/sudoingX/status/2031786899543822716

#opensource #hermesagent #qwen3.5 #llm #aiagents

Sudo su (@sudoingX) on X

absolute cinema. 31 tools. 85 skills. file ops, terminal, browser, cron, delegation, code execution. persistent memory across sessions. Hermes Agent running on a single RTX 3060 through Qwen 3.5 9B at 50 tok/s. this is what open source looks like when it ships.

X (formerly Twitter)

Thomas Mar 11

One more update for the slides of my talk "Run LLMs Locally":

Now including text to speech with Qwen3-TTS and Model Context Protocol.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai #mcp

TechLİfe Mar 10

Snowflake's Arctic Long Sequence Training: How to Train LLMs on 15 Million Tokens Without Selling a Kidney

https://techlife.blog/posts/snowflakes-arctic-long-sequence-training-how-to-train-llms-on-15-million-tokens-without-selling-a-kidney/

#ALST #Snowflake #LongContextTraining #DeepSpeed #HuggingFace #SequenceParallelism #LLMTraining #H100 #Llama8B #Qwen3 #GPUMemoryOptimization

Snowflake's Arctic Long Sequence Training: How to Train LLMs on 15 Million Tokens Without Selling a Kidney

Snowflake AI Research just open-sourced Arctic Long Sequence Training (ALST), a framework that pushes LLM training from a measly 32K tokens to over 15 million — a 469x improvement — using standard Hugging Face models and H100 GPUs. Here's what it means for you.

TechLife

Thomas Mar 9

I updated the slides for my talk "Run LLMs Locally":

Now including image generation with Qwen3 and content classification from the Qwen3Guard Technical Report paper.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai