SANA-WM, a 2.6B open-source world model for 1-minute 720p video

SANA-WM은 26억 파라미터의 오픈소스 월드 모델로, 단일 GPU에서 1분 길이의 720p 고화질 영상을 생성한다. 하이브리드 선형 어텐션과 6-DoF 카메라 제어, 2단계 생성 파이프라인을 통해 긴 시퀀스의 일관성과 품질을 유지하며, 64개의 H100 GPU로 15일간 학습 후 단일 GPU에서 실시간 추론이 가능하다. 공개된 21만 3천 개 영상 클립과 정밀한 카메라 위치 주석을 활용해 효율성과 정확도를 크게 개선했다. 이는 기존 대규모 산업용 모델과 비슷한 품질을 내면서도 36배 높은 처리량을 달성해 AI 영상 합성 및 시뮬레이션 분야에 실용적 진전을 보여준다.

https://nvlabs.github.io/Sana/WM/

#opensource #videogeneration #worldmodel #transformer #gpuinference

SANA-WM | Efficient Minute-Scale World Modeling

SANA-WM is an efficient minute-scale world model for camera-controlled 720p video generation.

Zero-Copy GPU Inference from WebAssembly on Apple Silicon

A WebAssembly module's linear memory can be shared directly with the Apple Silicon GPU: no copies, no serialization, no intermediate buffers. Here's how the zero-copy chain works, what we measured, and what it enables for stateful AI inference.

Abacus Noir

🚀 The team behind continuous batching is urging operators to put idle GPUs to work on inference. Learn how this boosts token throughput, taps spot GPU markets, and why providers like CoreWeave, Lambda Labs, and RunPod are taking note. Could your workloads run cheaper and faster? Dive in for the details. #GPUInference #ContinuousBatching #SpotGPUMarkets #InferenceSense

🔗 https://aidailypost.com/news/team-behind-continuous-batching-urges-operators-run-inference-idle

vLLM now powers high‑throughput inference with its new PagedAttention engine, cutting latency and boosting GPU utilization. Continuous batching lets you serve OpenAI‑scale workloads in production without sacrificing cost. Dive into how this open‑source stack reshapes large‑model serving. #vLLM #PagedAttention #GPUInference #MLInference

🔗 https://aidailypost.com/news/vllm-boosts-production-inference-through-high-throughput