How much "cool" can there be in one photo?

Woody Harrelson & Kristen Stewart at the Photocall for "Full Phil" during the 79th annual Cannes Film Festival on Saturday 16 May 2026 at Palais des Festivals

#WoodyHarrelson
#KristenStewart

#Cannes2026 @festival_cannes #monochrome #blackandwhite #fp8

Eric Cantona at the Photocall for "Cantona" during the 79th annual Cannes Film Festival on Saturday 16 May 2026 at Palais des Festivals.

#EricCantona
#Cantona
#ooahcantona

Shot on @fujifilm_uk at #Cannes2026 @festival_cannes #monochrome #blackandwhite #fp8 #filmpack8 #dxo

@enigma
Nun, manch KI Arbeit ja mit #FP8 oder #FP4 ... ;)
Da passen zwei floating point werte in ein Byte .... Ob sich da was machen lässt....? ;)
Diving into LTXV, my latest video diffusion experiments.

I’ve been experimenting with LTXV (ltxv-2b-0.9.8-distilled-fp8), combined with the text encoder umt5_xxl_fp8_e4m3fn_scaled.

The renderings showcase the hackercat, cherry blossoms, and a surreal city tour.

What it does:
- Generates latent video clips from text prompts
- Can produce a wide range of scenes, from surreal to photorealistic and beyond
- Perfect for short 1-2 second clips with creative prompts

Caution! 12 GB VRAM is tight:
- On my RX 6700 XT, it easily runs into OOM
- Frames, steps, and resolution need careful tuning
- FP8 helps, but some layers get upcast → memory can still fill up

Conclusion: Extremely powerful, but you need to tweak VRAM and settings to get stable results.

#AI #VideoDiffusion #LTXV #FP8 #GPU #CreativeAI #ShortVideos #Surreal #Photorealistic #StableVRAM #RX6700XT #AMD #ROCm #ComfyUI
YES SUCCEEDED!!!

Just rendered an image at 944×1152 (slightly above 1024×1024) using Flux1-Schnell-FP8 on my 6700 XT, and it works! (Image 1 is the Real-ESRGAN 2× upscaled version)

Workflow 1: Sampling (Image 2)

Prompt executed → UNet generates the latent

Step 1 (model load + latent generation) took 419 seconds

Output: Latent tensor saved to disk

Workflow 2 : VAE Decode (Image 3)

Latent loaded → VAE decodes the image

Duration: 7.5 seconds

Advantage: UNet doesn’t need to stay in VRAM → VRAM freed, even on 12 GB GPUs

The problem with the stock LoadLatent Node

Dropdown only shows files if they were produced / annotated by a previous SaveLatent Node

Node is designed to pass latents inside a graph, not load arbitrary files from disk

Purpose: prevents accidentally loading wrong files

Workaround (Image 4)

Edited /ComfyUI/nodes.py, class LoadLatent

Hardcoded latent path → Node now loads directly from disk

Result: Workflow 2 runs instantly, UNet can be unloaded

Timing

Step 1 (model load + latent generation): 419 s

Step 2 (VAE decode): 7.5 s

Result: High-res images on a 12 GB RDNA2 GPU are now possible on Flux1-Schnell-FP8 without ComfyUI crashing! (Image 5 is the original output)

This might actually become my new Flux workflow: render quick 512×512 previews first (which works perfectly on RDNA2 GPUs), sort out the good ones, extract the seed from the PNG metadata, and then re-render only the selected images with the same seed using the split workflow at higher resolutions. This way, high-resolution Flux1-Schnell-FP8 renders become possible on 12 GB RDNA2 GPUs D:

Question at the end: Has anyone ever done this before? Because I have no clue xD

#ComfyUI #flux #Flux1SchnellFP8 #FP8 #AMD #RDNA2 #VAE #AIArt #Pixelfed #HighResolution #GPUOptimization #LatentWorkflow #AIWorkflow #AIHacks #RealESRGAN #Upscale #AIExperiment #CreativeAI #DigitalArt #AICommunity #python #linux #opensource #foss
Maia 200: The AI accelerator built for inference - The Official Microsoft Blog

Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation. Maia 200 is an AI inference powerhouse: an accelerator built on TSMC’s 3nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216GB HBM3e at 7 TB/s and 272MB of on-chip SRAM, plus...

The Official Microsoft Blog

🚀 Đã backport FP8 cho RTX 3090, không cần H100! Bằng cách bỏ chuyển sang fp16 trong bộ nhớ toàn cục, tiết kiệm VRAM đáng kể, dù hiệu suất tính toán hơi giảm. Đã tích hợp torch extension, bạn có thể thử ngay trong workflow của mình. #AI #MachineLearning #FP8 #RTX3090 #CUDA #DeepLearning #AI_Vietnam #CôngNghệ

https://www.reddit.com/r/LocalLLaMA/comments/1qn0dl8/backporting_fp8_to_the_rtx_3090_no_h100_required/

Tăng tốc GPU đời cũ với giải pháp Software FP8! 🚀

Một nhà phát triển vừa ra mắt giải pháp giả lập định dạng FP8 bằng phần mềm (sử dụng Triton kernels) cho các dòng GPU không hỗ trợ phần cứng như RTX 30/20 series.

🔥 Kết quả:
- Tốc độ tăng gấp 3 lần đối với các tác vụ giới hạn bởi băng thông bộ nhớ (GEMV, FlashAttention).
- Hoạt động trên mọi GPU đời cũ.
- Tối ưu hóa việc đóng gói dữ liệu chính xác thấp vào FP32.

#AI #GPU #FP8 #MachineLearning #DeepLearning #CongNghe #PhanMem #Triton

https:/

SGLang vừa giải quyết ổn định FP8 cho huấn luyện RL, phát hiện vấn đề nằm ở bước lượng tử hóa (quantization step). Đây là bước tiến lớn cho RLHF và tinh chỉnh RL cục bộ, giúp đơn giản hóa việc sử dụng độ chính xác hỗn hợp.
#SGLang #FP8 #RLTraining #Quantization #AI #MachineLearning #HuấnLuyệnRL #TríTuệNhânTạo #HọcMáy

https://www.reddit.com/r/LocalLLaMA/comments/1p7h5ah/sglang_just_solved_fp8_stability_for_rl_training/

Tin tuyệt vời cho dân chơi LLM địa phương! Giờ đây bạn có thể thực hiện FP8 reinforcement learning ngay trên máy tính cá nhân với VRAM chỉ 5GB. Tốc độ nhanh hơn, ít tốn VRAM hơn so với BF16/FP16. Thử ngay với RTX 40/50 series!
#LocalLLM #AI #MachineLearning #hocmay #trituenhantao #fp8 #reinforcementlearning

https://www.reddit.com/r/LocalLLaMA/comments/1p6k0h2/you_can_now_do_fp8_reinforcement_learning_locally/