Z.ai (@Zai_org)

GLM-5.1에서 vLLM/SGLang 환경의 tool calling 문제를 수정하려면 최신 chat template로 업데이트해야 한다는 안내입니다. 프레임워크가 plain-text tool 호출을 자동 변환하는 과정에서 이슈가 발생할 수 있어, Hugging Face에 공개된 chat_template.jinja 반영이 중요합니다.

https://x.com/Zai_org/status/2044741938604093443

#llm #vllm #sglang #toolcalling #huggingface

Z.ai (@Zai_org) on X

GLM-5.1 Tool Calling Issue Fix & Chat Template Update If you are running GLM-5.1 with vLLM/SGLang and using tool calling, please update your chat template. https://t.co/YNi99exkB1 Issue When using tool calling, frameworks including vLLM automatically convert plain-text tool

X (formerly Twitter)

RT @ZenMagnets: Minimax m2.7 nvfp4 läuft mit ~130 tok/s im Single-Stream auf 2x RTX 6k mit sglang. Bis zu ~1500 tok/s bei 64 gleichzeitigen frischen Kontexten. Enormer Leistungsabfall bei höheren Kontexten. Aber viel schneller als meine m2.5 vLLM-Konfiguration von vor zwei Monaten (sprich: 2 KI-Jahre), und ich bin beeindruckt, wie sehr SgLang bei der Performance bei hoher Nebenläufigkeit aufgeholt hat, was früher eine Spezialität von vLLM war. Verwendung der lukealonso/MiniMax-M2.7-NVFP4 Konfiguration ➡️ Alt-Text des Bildes 𝗭𝗲𝗻 𝗠𝗮𝗴𝗻𝗲𝘁𝘀 (@ZenMagnets) GROSSE BEGEISTERUNG: Erster Minimax m2.5 NVFP4 Quant auf Hugging Face. 83 tok/s Single-Stream vLLM auf zwei RTX 6000. Oder etwa doppelt so schnell wie ein Mac 512GB-System, das halb so viel kostet. Außer dass der Mac nicht auch 1000+ tok/s über 32+ gleichzeitige Verbindungen schafft. Leistungsbegrenzung bei 550W pro GPU für diesen Test. lukealonso/MiniMax-M2.5-NVFP4 vLLM-Rezept, das ich im Alt-Text des Bildes verwendet habe — https://nitter.net/ZenMagnets/status/2022562893091475626#m

mehr auf Arint.info

#AI #GPU #LLM #MachineLearning #NVIDIA #SGLang #arint_info

https://x.com/ZenMagnets/status/2044281284885958780#m

Install SGLang with uv, pip, or Docker; configure YAML and server flags; then serve Hugging Face LLMs with an OpenAI-compatible API plus native /generate and offline Engine examples.

#Cheatsheet #Self-Hosting #LLM #AI #AI Coding #DevOps #Docker #sglang #openai #SelfHosting

https://www.glukhov.org/llm-hosting/sglang/

SGLang QuickStart: Install, Configure, and Serve LLMs via OpenAI API

Install SGLang with uv, pip, or Docker; configure YAML and server flags; then serve Hugging Face LLMs with an OpenAI-compatible API plus native /generate and offline Engine examples.

Rost Glukhov | Personal site and technical blog

SGLang and vLLM Workshops Coming to GOSIM Paris 2026!

The GOSIM Workshops have long been known for their diversity, hands-on learning, and interactivity, making them one of the most popular segments of the conference.

This May, the SGLang Workshop and vLLM Workshop will arrive at GOSIM Paris 2026, bringing together AI infrastructure developers from around the world to explore the latest advances in LLM inference systems.

Ticket purchase link:
https://eventbrite.com/e/gosim-paris-2026-tickets-1984013840806?aff=oddtdtcreator

#SGLang #vLLM

🚀 Big news!
The SGLang Workshop & vLLM Workshop are coming to GOSIM Paris 2026! 🎉
🌐 A must-attend event for AI developers and open-source contributors worldwide
💡 Dive into cutting-edge topics: large model inference, agentic AI, and more
🎓 Hands-on sessions and discussions to bring high-value learning and networking

Get your early bird tickets now and enjoy the discount: https://eventbrite.com/e/gosim-paris-2026-tickets-1984013840806?aff=oddtdtcreator 🚀

#GOSIMParis2026 #SGLang #vLLM #AIWorkshop #OpenSourceAI

Qwen (@Alibaba_Qwen)

Qwen 3.5 Medium 모델 시리즈의 FP8 가중치가 공개되어 배포 준비 완료되었다는 공지입니다. vLLM과 SGLang에 대한 네이티브 지원이 포함되며 모델 카드에 예제 코드가 제공됩니다. FP8 정밀도로 워크플로 최적화가 가능하며 가중치는 Hugging Face에서 획득할 수 있다고 안내합니다.

https://x.com/Alibaba_Qwen/status/2026682179305275758

#qwen3.5 #fp8 #vllm #huggingface #sglang

Qwen (@Alibaba_Qwen) on X

🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment! Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face:https://t.co/3MSb7miq68

X (formerly Twitter)

Qwen (@Alibaba_Qwen)

Qwen 3.5 Medium 시리즈의 FP8 가중치가 공개되어 배포 가능하다는 공지입니다. vLLM과 SGLang에 네이티브 지원을 제공하며, 모델 카드에 예제 코드가 포함되어 있습니다. FP8 정밀도로 워크플로우 최적화 가능하며 가중치는 Hugging Face에서 확인·다운로드할 수 있습니다.

https://x.com/Alibaba_Qwen/status/2026682179305275758

#qwen #fp8 #vllm #huggingface #sglang

Qwen (@Alibaba_Qwen) on X

🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment! Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face:https://t.co/3MSb7miq68

X (formerly Twitter)

Qwen (@Alibaba_Qwen)

SGLang을 사용해 새로운 AI 애플리케이션을 즉시 실행할 수 있다는 내용입니다. SGLang은 최근 주목받는 경량 언어 모델 프레임워크로, 개발자들이 신속하게 모델을 배포하고 테스트할 수 있게 합니다.

https://x.com/Alibaba_Qwen/status/2026348924433477775

#sglang #ai #framework #deployment

Qwen (@Alibaba_Qwen) on X

✨ Run it now with SGLang!Chong!

X (formerly Twitter)

Qwen (@Alibaba_Qwen)

Qwen3.5-397B-A17B-FP8 모델 가중치가 공개되었다는 발표입니다. SGLang 지원이 병합되었고 vLLM용 PR이 제출되어(vLLM 리포 연동 예정) 주요 추론 프레임워크에서 곧 사용 가능해진다는 기술·오픈소스 업데이트를 알립니다. 모델 카드와 예제 코드도 제공됩니다.

https://x.com/Alibaba_Qwen/status/2024161147537232110

#qwen3.5 #openweights #vllm #sglang

Qwen (@Alibaba_Qwen) on X

🚀 Qwen3.5-397B-A17B-FP8 weights are now open! It took some time to adapt the inference frameworks, but here we are: ✅ SGLang support is merged 🔄 vLLM PR submitted → https://t.co/rJkuitOBWs Check the model card for example code. vLLM support landing in the next couple of days!

X (formerly Twitter)

Qwen (@Alibaba_Qwen)

Qwen3.5-397B-A17B-FP8 모델의 가중치가 공개되었습니다. 추론 프레임워크 적응이 진행되었고 SGLang 지원이 병합되었으며 vLLM에 대한 PR이 제출되어 곧 vLLM 지원이 도입될 예정입니다. 모델 카드에 예제 코드가 포함되어 있어 개발자들이 곧바로 테스트하고 배포할 수 있습니다.

https://x.com/Alibaba_Qwen/status/2024161147537232110

#qwen3.5 #openweights #vllm #sglang #inference

Qwen (@Alibaba_Qwen) on X

🚀 Qwen3.5-397B-A17B-FP8 weights are now open! It took some time to adapt the inference frameworks, but here we are: ✅ SGLang support is merged 🔄 vLLM PR submitted → https://t.co/rJkuitOBWs Check the model card for example code. vLLM support landing in the next couple of days!

X (formerly Twitter)