Interessante Erklärung von Databricks zu Vektordatenbanken.
🔑 **Kern-Einblick:** Traditionelle DBs finden exakte Treffer, Vektordatenbanken verstehen *Bedeutung* durch Embeddings – essenziell für semantische Suche & KI-Anwendungen.
- Handhabt hochdimensionale Vektoreffizient
- Beschleunigt Retrieval-Augmented Generation (RAG)
- Unterstützt komplexe Ähnlichkeitssuchen

#VectorDB #MachineLearning #KünstlicheIntelligenz #DataEngineering #Databricks

🔗 https://news.google.com/rss/articles/CBMiakFVX3lxTE9rWndNaVdDamV2VVlEQVFvWTVxaV92V0ZmODNvajdYdVFrZ1g1b1AtOFkwaW1QSzZuLWw0QTdUU1BmblVHOVBYazlzUXJDSHN4bmtYS0JaZkVCMjNBaGd6NFBrcHVFdjc1b0E?oc=5

Before you continue

GitHub - spotify/annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk - spotify/annoy

GitHub
Have pushed 0.9.5-dev branch to codeberg of foxing ( https://codeberg.org/aenertia/foxing/src/branch/0.9.5-dev ) in preparation for release tagging. A LOT of features and a couple of bug-fixes now the packet/file processing engine has stabilized ; including Semantic Routing to Parsers for Metadata Extraction and in-path Binary analysis using local ORT/BERT models ; letting you get semantic search powers for free when you copy something with foxingd/fxcp #linux #filesystem #bert #vectordb #postgres #xfs #stratis #blake3 #localllm
foxing

`foxing` (formerly xfs-mirror) aspires to be a production-grade, eBPF-powered replication engine for Linux filesystems (XFS, Btrfs, F2FS, Ext4). It captures filesystem events in the kernel and replays them asynchronously on a target directory, providing near real-time mirroring with robust consis...

Codeberg.org

everyone is talking about RAG, so I went down the rabbit hole and found the boring part that actually makes it work: retrieval and chunking

over the last months I've been studying and experimenting with vector databases and local LLMs, and I turned that into a three–part series on the Storyblok blog

today the last and most challenging piece is out: a step-by-step guide to building a fully local RAG pipeline with @weaviate + @ollama

in the article I show, with real code and a full repo linked:

• why hybrid search (BM25 + vectors) beats pure vector search
• how bad chunking quietly ruins most RAG systems
• why a smaller model + good retrieval often beats a huge (expensive) model with bad context
• how a structured CMS (like Storyblok) basically gives you chunking for free

stack:
• Weaviate for vectors and hybrid search
• Node.js for the glue
• Qwen 3.5 on Ollama running locally (but this works with cloud models too)

if you work on docs, DX, or AI features for content-heavy products, this might be a useful starting point

and since I'm still new to the topic, I'd really like feedback from you if you've built RAG systems in production

full article with code + repo:

How to build a RAG pipeline with Weaviate and Ollama – https://www.storyblok.com/mp/how-to-build-a-rag-pipeline-with-weaviate-and-ollama

#rag #vectordb #llm #semanticsearch

How to build a RAG pipeline with Weaviate and Ollama | Storyblok

Turn your vector database into an AI assistant. Build a local RAG pipeline with Weaviate and Ollama using hybrid search, smart chunking, and grounded answers.

@OpenSearchProj was named a Leader and Fast Mover in the 2025 GigaOm Radar for Vector Databases 🏆

My #OpenSearch report highlights:
✅ Platform play
✅ Search variety
✅ Business criteria
✅ Security
And I'd add - it's OPEN SOURCE @linuxfoundation !!
https://opensearch.org/gigaom-radar-vector-report-2025/

#gigaom #vectorDB

ICYMI, #OpenSearch 3.5 is here! 🤩
I shared some of my personal highlights in this short clip.
Hope you enjoy the format 👍
#OpenSearchAmbassador @OpenSearchProject @linuxfoundation #observability #analytics #search #vectorDB #kubecon #kubeconEU #o11yDay
Vector Search Made Simple: Getting Started with OpenSearch for AI Applications - Dotan Horovits

YouTube

Stoked seeing the OpenSearch Project featured by Jensen Huang on #NVIDIA #GTC keynote! 😍

One of the innovations in #OpenSearch V3 has been adding GPU acceleration based on NVIDIA's cuVS. Our #VectorSearch benchmarks, using CAGRA algorithm integrated through Facebook's Faiss library, showed:
✅ 9.3x faster index builds
✅ 3.75x lower cost
✅ 2x higher throughput
✅ 2.5x lower CPU usage

https://www.linkedin.com/feed/update/urn:li:activity:7439600547852189697/

#OpenSearchAmbassador #opensource #gtc2026 #gtc26 #cuvs #vectordb

#gtc #opensearch #nvidia #opensource #opensearchambassador #vectorsearch | Dotan Horovits

Stoked seeing the OpenSearch Project featured by Jensen Huang on NVIDIA #GTC keynote! 😍 One of the innovations in #OpenSearch V3 has been adding GPU acceleration based on #NVIDIA's cuVS. Our benchmarks, using CAGRA algorithm integrated through Facebook's Faiss library, showed: ✅ 9.3x faster index builds ✅ 3.75x lower cost ✅ 2x higher throughput ✅ 2.5x lower CPU usage That's the power of bringing the best of #opensource in vector search together. Check out the comments for the full benchmark setup and results, and more details on the architecture, as well as the RFC on GitHub. Well done to Navneet Verma Corey Nolet Kshitiz G. Dylan Tong Nathan Stephens Vamshi Vijay Nakkirtha and all involved! #OpenSearchAmbassador #VectorSearch

LinkedIn
310% throughput increase and 300% latency reduction!
Great work by the AWS #opensearch engineers with bulk SIMD brings these performance gains in @OpenSearchProject 's vector search 👏
And it's all #opensource under @linuxfoundation 🤩
https://opensearch.org/blog/accelerating-fp16-vector-search-performance-using-bulk-simd-in-opensearch-3-5/
#vectorDB #search #OpenSearchAmbassador

Chunking: an essential concept to understand for Retrieval-Augmented Generation (#RAG). It is the process of dividing large documents into smaller, manageable segments called “chunks.” Effective chunking preserves semantic meaning while ensuring content fits within model context limits.

Proper chunking is essential, as it directly affects retrieval quality. Well-structured chunks improve precision and support more accurate responses.



#OpenSource #devops #vectordb #programming #vector #search