This might be huge (esp. for future #Gemini versions):
#Google intoduced #TurboQuant - new compression algorithm that reduces #LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining #AI efficiency.
TurboQuant: Redefining AI efficiency with extreme compression https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

👾