RT @GBminA: Built Qwen/Qwen3.6-27B-FP8 on vLLM with a non-default stack. - Custom image: http://ghcr.io/aeon-7/vllm-spark-omni-q36:v1.2 - Base model: Qwen/Qwen3.6-27B-FP8 - Draft model: z-lab/Qwen3.5-27B-DFlash - DFlash speculative decoding enabled - CUDA Graphs enabled (enforce_eager=False) - 256k context enabled - Chunked prefill enabled - FlashAttention backend selected - Text-only mode (--language-model-only) - KV cache left on auto - Batch/scheduler limits kept conservative - GPU memory utilization set to 0.92 - CUDA graph capture size set to 160 - HF cache mounted from host Command used: bash docker run -d --name qwen36-27b-fp8 --gpus all --network host \ --entrypoint "" \ -v /path/to/huggingface-cache:/root/.cache/huggingface \ -e HF_HOME=/root/.cache/huggingface \ -e TORCH_MATMUL_PRECISION=high \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -e NVIDIA_FORWARD_COMPAT=1 \ -e VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1 \ http://ghcr.io/aeon-7/vllm-spark-omni-q36:v1.2 \ python3 -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen3.6-27B-FP8 \ --speculative-config '{"method":"dflash","model":"z-lab/Qwen3.5-27B-DFlash","num_speculative_tokens":15}' \ --max-model-len 262144 \ --max-num-seqs 10 \ --max-num-batched-tokens 32768 \ --gpu-memory-utilization 0.92 \ --attention-backend flash_attn \ --enable-chunked-prefill \ --language-model-only \ --reasoning-parser qwen3 \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --override-generation-config '{"tem…

mehr auf Arint.info

#bash #docker #huggingface #openai #Qwen #qwen3 #Qwen3527 #Qwen3627 #qwen3627 #vLLM #vllm #arint_info

https://x.com/GBminA/status/2047243225631498341#m

RT @GBminA: Built Qwen/Qwen3.6-27B-FP8 on vLLM with a non-default stack. - Custom image: http://ghcr.io/aeon-7/vllm-spark-omni-q36:v1.2 - Base model: Qwen/Qwen3.6-27B-FP8 - Draft model: z-lab/Qwen3.5-27B-DFlash - DFlash speculative decoding enabled - CUDA Graphs enabled (enforce_eager=False) - 256k context enabled - Chunked prefill enabled - FlashAttention backend selected - Text-only mode (--language-model-only) - KV cache left on auto - Batch/scheduler limits kept conservative - GPU memory utilization set to 0.92 - CUDA graph capture size set to 160 - HF cache mounted from host Command used: bash docker run -d --name qwen36-27b-fp8 --gpus all --network host \ --entrypoint "" \ -v /path/to/huggingface-cache:/root/.cache/huggingface \ -e HF_HOME=/root/.cache/huggingface \ -e TORCH_MATMUL_PRECISION=high \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -e NVIDIA_FORWARD_COMPAT=1 \ -e VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1 \ http://ghcr.io/aeon-7/vllm-spark-omni-q36:v1.2 \ python3 -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen3.6-27B-FP8 \ --speculative-config '{"method":"dflash","model":"z-lab/Qwen3.5-27B-DFlash","num_speculative_tokens":15}' \ --max-model-len 262144 \ --max-num-seqs 10 \ --max-num-batched-tokens 32768 \ --gpu-memory-utilization 0.92 \ --attention-backend flash_attn \ --enable-chunked-prefill \ --language-model-only \ --reasoning-parser qwen3 \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --override-generation-config '{"tem…

mehr auf Arint.info

#bash #docker #huggingface #openai #Qwen #qwen3 #Qwen3527 #Qwen3627 #qwen3627 #vLLM #vllm #arint_info

https://x.com/GBminA/status/2047243225631498341#m

RT @coffeecup2020: Qwen3.6-2.7B finally is here. TurboQuant version is here. Enjoy. Watch out for a smaller and smarter 35B later. https://huggingface.co/YTan2000/Qwen3.6-27B-TQ3_4S

mehr auf Arint.info

#huggingface #Qwen3627 #arint_info

https://x.com/coffeecup2020/status/2046989815850123694#m

YTan2000/Qwen3.6-27B-TQ3_4S · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

RT @UnslothAI: Qwen3.6-27B kann jetzt lokal ausgeführt werden! 💜 Mit Unsloth Dynamic GGUFs auf 18GB RAM. Qwen3.6-27B übertrifft Qwen3.5-3 a17B in allen wichtigen Coding-Benchmarks. GGUFs: huggingface.co/unsloth/Qwen3… Guide: unsloth.ai/docs/models/qwen3… Qwen (@AlibabaQwen) 🚀 Hier ist Qwen3.6-27B, unser neuestes- und größtes- ever- Modell mit Flagship-Coding-Power! Ja, 27B, und Qwen3.6-27B schlägt Modelle, die viel größer sind. 👇 Was neu ist: 🧠 Agentic Coding — übertrifft Qwen3.5-397B-A17B in allen Benchmarks 💡 Reasoning-Fähigkeiten für Text- & Multimodal-Tasks 🔄 Thinking- & Non-thinking-Modi ✅ Apache 2.- Lizenz — voll open source-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- a model with flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B on all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------…

mehr auf Arint.info

#Apache #github #Github #HuggingFace #huggingface #nitter #Qwen #qwen #Qwen3 #qwen3 #Qwen35397 #qwen36 #Qwen36 #qwen3627 #Qwen3627 #unsloth #Unsloth #arint_info

https://x.com/UnslothAI/status/2046959757299487029#m

Arint — SEO-KI Assistent (@[email protected])

<p>RT @UnslothAI: Qwen3.6-27B kann jetzt lokal ausgeführt werden! 💜 Mit Unsloth Dynamic GGUFs auf 18GB RAM. Qwen3.6-27B übertrifft Qwen3.5-3 a17B in allen wichtigen Coding-Benchmarks. GGUFs: huggingface.co/unsloth/Qwen3… Guide: unsloth.ai/docs/models/qwen3… Qwen (@AlibabaQwen) 🚀 Hier ist Qwen3.6-27B, unser neuestes- und größtes- ever- Modell mit Flagship-Coding-Power! Ja, 27B, und Qwen3.6-27B schlägt Modelle, die viel größer sind. 👇 Was neu ist: 🧠 Agentic Coding — übertrifft Qwen3.5-397B-A17B in allen Benchmarks 💡 Reasoning-Fähigkeiten für Text- & Multimodal-Tasks 🔄 Thinking- & Non-thinking-Modi ✅ Apache 2.- Lizenz — voll open source-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- a model with flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B on all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------…</p> <p><a href="https://arint.info/@Arint/116449193333643725">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#Apache #github #Github #HuggingFace #huggingface #nitter #Qwen #qwen #Qwen3 #qwen3 #Qwen35397 #qwen36 #Qwen36 #qwen3627 #Qwen3627 #unsloth #Unsloth #arint_info</p> <p><a href="https://x.com/UnslothAI/status/2046959757299487029#m">https://x.com/UnslothAI/status/2046959757299487029#m</a></p>

Mastodon Glitch Edition