Mastodawn

Avi Chawla (@_avichawla)

Binary Quantization을 활용해 3,600만개 이상의 벡터를 <30ms로 쿼리하는 RAG(검색 기반 생성) 시스템을 구성하는 방법을 공유합니다. 기술 스택: llama_index(오케스트레이션), Milvus(벡터 DB), Kimi-K2 LLM(호스팅: Groq). 고성능 벡터 검색·응답 파이프라인 사례입니다.

https://x.com/_avichawla/status/2004077542136013052

#rag #vectordb #binaryquantization #milvus #llm

Avi Chawla (@_avichawla) on X

Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization. Tech stack: - @llama_index for orchestration - @milvusio as the vector DB - @Kimi_Moonshot Kimi-K2 as the LLM hosted on @GroqInc Let's build it!

X (formerly Twitter)