everyone is talking about RAG, so I went down the rabbit hole and found the boring part that actually makes it work: retrieval and chunking
over the last months I've been studying and experimenting with vector databases and local LLMs, and I turned that into a three–part series on the Storyblok blog
today the last and most challenging piece is out: a step-by-step guide to building a fully local RAG pipeline with @weaviate + @ollama
in the article I show, with real code and a full repo linked:
• why hybrid search (BM25 + vectors) beats pure vector search
• how bad chunking quietly ruins most RAG systems
• why a smaller model + good retrieval often beats a huge (expensive) model with bad context
• how a structured CMS (like Storyblok) basically gives you chunking for free
stack:
• Weaviate for vectors and hybrid search
• Node.js for the glue
• Qwen 3.5 on Ollama running locally (but this works with cloud models too)
if you work on docs, DX, or AI features for content-heavy products, this might be a useful starting point
and since I'm still new to the topic, I'd really like feedback from you if you've built RAG systems in production
full article with code + repo:
How to build a RAG pipeline with Weaviate and Ollama – https://www.storyblok.com/mp/how-to-build-a-rag-pipeline-with-weaviate-and-ollama
#rag #vectordb #llm #semanticsearch