Qwen (@Alibaba_Qwen)

Qwen 3.5 Medium 모델 시리즈의 FP8 가중치가 공개되어 배포 준비 완료되었다는 공지입니다. vLLM과 SGLang에 대한 네이티브 지원이 포함되며 모델 카드에 예제 코드가 제공됩니다. FP8 정밀도로 워크플로 최적화가 가능하며 가중치는 Hugging Face에서 획득할 수 있다고 안내합니다.

https://x.com/Alibaba_Qwen/status/2026682179305275758

#qwen3.5 #fp8 #vllm #huggingface #sglang

Qwen (@Alibaba_Qwen) on X

🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment! Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face:https://t.co/3MSb7miq68

X (formerly Twitter)

Qwen (@Alibaba_Qwen)

Qwen 3.5 Medium 시리즈의 FP8 가중치가 공개되어 배포 가능하다는 공지입니다. vLLM과 SGLang에 네이티브 지원을 제공하며, 모델 카드에 예제 코드가 포함되어 있습니다. FP8 정밀도로 워크플로우 최적화 가능하며 가중치는 Hugging Face에서 확인·다운로드할 수 있습니다.

https://x.com/Alibaba_Qwen/status/2026682179305275758

#qwen #fp8 #vllm #huggingface #sglang

Qwen (@Alibaba_Qwen) on X

🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment! Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face:https://t.co/3MSb7miq68

X (formerly Twitter)
Diving into LTXV, my latest video diffusion experiments.

I’ve been experimenting with LTXV (ltxv-2b-0.9.8-distilled-fp8), combined with the text encoder umt5_xxl_fp8_e4m3fn_scaled.

The renderings showcase the hackercat, cherry blossoms, and a surreal city tour.

What it does:
- Generates latent video clips from text prompts
- Can produce a wide range of scenes, from surreal to photorealistic and beyond
- Perfect for short 1-2 second clips with creative prompts

Caution! 12 GB VRAM is tight:
- On my RX 6700 XT, it easily runs into OOM
- Frames, steps, and resolution need careful tuning
- FP8 helps, but some layers get upcast → memory can still fill up

Conclusion: Extremely powerful, but you need to tweak VRAM and settings to get stable results.

#AI #VideoDiffusion #LTXV #FP8 #GPU #CreativeAI #ShortVideos #Surreal #Photorealistic #StableVRAM #RX6700XT #AMD #ROCm #ComfyUI

Awni Hannun (@awnihannun)

MLXs의 CUDA 백엔드가 개선되어 시작 시간이 빠르고 전반적인 성능도 향상되었습니다. 작성자는 Qwen3 4B를 fp8로 DGX Spark에서 실행해 1만8500토큰을 4초 미만에 처리했으며, 1만8500 컨텍스트에서 초당 32.5토큰 생성 속도를 기록했다고 보고했습니다. 이는 대규모 컨텍스트에서의 실사용 성능 향상 사례입니다.

https://x.com/awnihannun/status/2020576431307452682

#mlx #cuda #qwen3 #fp8 #dgx

Awni Hannun (@awnihannun) on X

MLXs CUDA backend is getting better. It's especially nice if you appreciate fast startup times. But it's also quite fast in general. Here's Qwen3 4B in fp8 running on my DGX Spark. - Processed 18.5k tokens in < 4 seconds - Generates at 32.5 tok/sec with 18.5k context

X (formerly Twitter)

Andrej Karpathy (@karpathy)

FP8 학습을 활성화해 'time to GPT-2'가 4.3% 개선되어 2.91시간으로 단축되었고, 8×H100 스팟 인스턴스 가격을 쓰면 GPT-2 재현 비용이 약 $20 수준이라고 보고. 과거 GPT-2 공개 논란을 언급하며 현재의 경제성과 성능 향상을 강조함.

https://x.com/karpathy/status/2018804068874064198

#fp8 #training #gpt2 #h100 #optimization

Andrej Karpathy (@karpathy) on X

Enabled fp8 training for +4.3% improvement to "time to GPT-2", down to 2.91 hours now. Also worth noting that if you use 8XH100 spot instance prices, this GPT-2 repro really only costs ~$20. So this is exciting - GPT-2 (7 years ago): too dangerous to release. GPT-2 (today): new

X (formerly Twitter)
YES SUCCEEDED!!!

Just rendered an image at 944×1152 (slightly above 1024×1024) using Flux1-Schnell-FP8 on my 6700 XT, and it works! (Image 1 is the Real-ESRGAN 2× upscaled version)

Workflow 1: Sampling (Image 2)

Prompt executed → UNet generates the latent

Step 1 (model load + latent generation) took 419 seconds

Output: Latent tensor saved to disk

Workflow 2 : VAE Decode (Image 3)

Latent loaded → VAE decodes the image

Duration: 7.5 seconds

Advantage: UNet doesn’t need to stay in VRAM → VRAM freed, even on 12 GB GPUs

The problem with the stock LoadLatent Node

Dropdown only shows files if they were produced / annotated by a previous SaveLatent Node

Node is designed to pass latents inside a graph, not load arbitrary files from disk

Purpose: prevents accidentally loading wrong files

Workaround (Image 4)

Edited /ComfyUI/nodes.py, class LoadLatent

Hardcoded latent path → Node now loads directly from disk

Result: Workflow 2 runs instantly, UNet can be unloaded

Timing

Step 1 (model load + latent generation): 419 s

Step 2 (VAE decode): 7.5 s

Result: High-res images on a 12 GB RDNA2 GPU are now possible on Flux1-Schnell-FP8 without ComfyUI crashing! (Image 5 is the original output)

This might actually become my new Flux workflow: render quick 512×512 previews first (which works perfectly on RDNA2 GPUs), sort out the good ones, extract the seed from the PNG metadata, and then re-render only the selected images with the same seed using the split workflow at higher resolutions. This way, high-resolution Flux1-Schnell-FP8 renders become possible on 12 GB RDNA2 GPUs D:

Question at the end: Has anyone ever done this before? Because I have no clue xD

#ComfyUI #flux #Flux1SchnellFP8 #FP8 #AMD #RDNA2 #VAE #AIArt #Pixelfed #HighResolution #GPUOptimization #LatentWorkflow #AIWorkflow #AIHacks #RealESRGAN #Upscale #AIExperiment #CreativeAI #DigitalArt #AICommunity #python #linux #opensource #foss
Maia 200: The AI accelerator built for inference - The Official Microsoft Blog

Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation. Maia 200 is an AI inference powerhouse: an accelerator built on TSMC’s 3nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216GB HBM3e at 7 TB/s and 272MB of on-chip SRAM, plus...

The Official Microsoft Blog

🚀 Đã backport FP8 cho RTX 3090, không cần H100! Bằng cách bỏ chuyển sang fp16 trong bộ nhớ toàn cục, tiết kiệm VRAM đáng kể, dù hiệu suất tính toán hơi giảm. Đã tích hợp torch extension, bạn có thể thử ngay trong workflow của mình. #AI #MachineLearning #FP8 #RTX3090 #CUDA #DeepLearning #AI_Vietnam #CôngNghệ

https://www.reddit.com/r/LocalLLaMA/comments/1qn0dl8/backporting_fp8_to_the_rtx_3090_no_h100_required/

Brie Wensleydale (@SlipperyGem)

Qwen Image 2512(BF16, GGUF 포맷)와 Flux Klein 9B(FP8)를 비교한 의견. 작성자는 Qwen 쪽의 표현과 정확도를 더 선호하며 Flux Klein 쪽에서 여러 문제가 보인다고 평가하고 있음. 두 모델/포맷 간 품질과 안정성 차이를 묻는 내용.

https://x.com/SlipperyGem/status/2013621827369869503

#qwen #flux #gguf #bf16 #fp8

Brie Wensleydale🧀🐭 (@SlipperyGem) on X

Prompt Qwen Image 2512 BF16 GGUF Flux Klein 9B FP8 Man, I just dig Qwen's vibes and accuracy so much more than I do the Flux one. Also, the more you look at the Flux one, the more problems it has. What do you think?

X (formerly Twitter)

Tăng tốc GPU đời cũ với giải pháp Software FP8! 🚀

Một nhà phát triển vừa ra mắt giải pháp giả lập định dạng FP8 bằng phần mềm (sử dụng Triton kernels) cho các dòng GPU không hỗ trợ phần cứng như RTX 30/20 series.

🔥 Kết quả:
- Tốc độ tăng gấp 3 lần đối với các tác vụ giới hạn bởi băng thông bộ nhớ (GEMV, FlashAttention).
- Hoạt động trên mọi GPU đời cũ.
- Tối ưu hóa việc đóng gói dữ liệu chính xác thấp vào FP32.

#AI #GPU #FP8 #MachineLearning #DeepLearning #CongNghe #PhanMem #Triton

https:/