#ragnar v0.2.0 is on CRAN #rstats!
It has a MUCH improved chunker; markdown_chunk() picks better boundaries, builds context, segments by headings, and handles overlapping chunks. Oh, and ragnar_retrieve() can deoverlap retrieved chunks now.
Other highlights:
- read_as_markdown() handles HTML better, now supports selectors
- Store inspector now displays chunk context and looks nicer.
- More embedding providers (Google, Databricks, Bedrock)
Website got a big update too: https://ragnar.tidyverse.org

Retrieval-Augmented Generation (RAG) Workflows
Provides tools for implementing Retrieval-Augmented Generation (RAG) workflows with Large Language Models (LLM). Includes functions for document processing, text chunking, embedding generation, storage management, and content retrieval. Supports various document types and embedding providers (Ollama, OpenAI), with DuckDB as the default storage backend. Integrates with the ellmer package to equip chat objects with retrieval capabilities. Designed to offer both sensible defaults and customization options with transparent access to intermediate outputs. For a review of retrieval-augmented generation methods, see Gao et al. (2023) "Retrieval-Augmented Generation for Large Language Models: A Survey" <doi:10.48550/arXiv.2312.10997>.