🧠 From Simple Indexing to Semantic Understanding: Why I Layered Both Approaches
Finishing LLM Zoomcamp Module 2 felt like leveling up my RAG system. I was already doing agentic RAG in Module 1, but vector search opened a whole new layer of retrieval flexibility. Here's why the technical decisions matter:
-**Gained exposure to various vector databases including pgvector, sqlitesearch, and minsearch** – Each tool carries distinct tradeoffs: pgvector for PostgreSQL integration, SQLite for lightweight local workloads, minsearch for in-memory prototyping. Knowing which fits where matters more than the technology itself
- **Embedding actual lesson content with ONNX library** - Lightweight CPU inference means this stacks directly on existing infrastructure without needing GPU dependencies or scaling headaches
- **Chunking 72 lesson pages into ~300 chunks with 50% overlap** - Sliding window preserves context across topic boundaries while reducing prompt token usage compared to whole-page indexing
- **Building the same query against both vector and keyword indexes to compare scores** - Quantifies semantic vs lexical retrieval so you can decide when each method adds value
- **Using hybrid search (RRF fusion) to blend vector and keyword search results intelligently** - Captures both conceptual meaning and precise terminology, which matters when queries span multiple technical domains
One thing that stuck: even queries like "How do I store vectors in PostgreSQL?" returned meaningful results because I was comparing semantic similarity, not just matching words. That's the difference lexical vs. semantic search really makes. It shows hybrid search isn't just a nice-to-have, it's practical engineering when you care about retrieval precision and coverage.
Project is live if you're curious to see how the pieces fit together: https://github.com/ammartin8/llm_zoomcamp_portfolio/blob/main/modules/02_vector_search/project_02/project_vector_search_case_study.md
Huge thanks again to Alexey Grigorev for putting this together, open-source learning at this level matters more than most realize. Anyone else finishing up Module 2 or working with hybrid retrieval themselves?
#ai #localai #llm #mastodon #fediverse #buildinpublic #linux #github #aiengineering #DataEngineering #agentic #rag #vector #openai








