Mastodawn

AIagent.at 🤖 AI News

IndexCache, a new sparse attention optimizer from Tsinghua University and Z.ai, delivers 1.82x faster inference on long-context AI models by cutting up to 75% of redundant computation. The training-free approach works with DeepSeek Sparse Attention architecture. https://venturebeat.com/technology/indexcache-a-new-sparse-attention-optimizer-delivers-1-82x-faster-inference #AIagent #AI #GenAI #AIInfrastructure