Mastodawn

Google Research (@GoogleResearch)

Google이 TurboQuant라는 새 압축 알고리즘을 공개했다. LLM의 key-value cache 메모리를 최소 6배 줄이고 최대 8배 속도를 높이며, 정확도 손실 없이 AI 효율성을 크게 개선한다고 밝혔다. LLM 추론 최적화와 메모리 절감 측면에서 매우 중요한 기술 발표다.

https://x.com/GoogleResearch/status/2036533564158910740

#llm #compression #kvcache #inference #ai

Google Research (@GoogleResearch) on X

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc

X (formerly Twitter)