Как я построил Graph RAG систему с точностью 96.7% за 5 дней: от научных статей до production-ready пайплайна
Я реализовал Graph RAG систему, которая комбинирует 5 техник из свежих научных статей (KET-RAG, HippoRAG 2, VectorCypher) в единый пайплайн с декларативным Datalog reasoning-движком, полной провенансной трассировкой и типизированным API. Результат: 174/180 (96.7%) на билингвальном бенчмарке из 30 вопросов, оценённых в 6 режимах retrieval. Три режима достигли 100%. В статье — архитектура, 10 уроков оптимизации и эволюция от 38% до 96.7% за 10 итераций.
https://habr.com/ru/articles/1003064/
#GraphRAG #RAG #Neo4j #NLP #LLM #Python #Datalog #Knowledge_Graph #embeddings #PageRank

Skeleton Indexing (KDD 2025) + HippoRAG 2 (ICML 2025) + VectorCypher + Datalog Reasoning + 10 итераций оптимизации TL;DR Я реализовал Graph RAG систему, которая комбинирует 5 техник из свежих научных...
Even though I mostly use #Datalog databases these days (mostly #Datomic), many #PostgreSQL tidbits make me (unreasonably?) happy. Like this one: “Aggregate first - join later”
https://www.cybertec-postgresql.com/en/super-fast-aggregations-in-postgresql-19/
Recursion in #Draupnir is getting closer, making it very nearly a proper #Datalog compiler. What would normally be a simple task is becoming considerably harder due to the need to support general monoid bases for the relations (which we want for cleaner aggregates than Souffle), as well as the need to handle batch scheduling to support disk.
The main challenge so far has been coming up with an execution plan that safely batches each iteration, while playing nicely with our push+pull scheduler, and simultaneously making sure that it maintains the correct arity of each tuple. Not hard... but very finicky.
We've come up with a pretty clean set of extensions to our logical pipeline DAG that seem like they elegantly capture recursion, and compiling a simple (count the paths) query to the logical stage appears to be producing a sensible graph. This has revealed some bugs in the pipeline optimizer, and we still need to add support into the interpreter... but it's progressing.