Planning to make large behavioural changes to a (sometimes long-running) production-grade AI agent. Working with `pydantic-evals` today because I want to eval the agent before and after. So far it looks very similar to Langfuse datasets/runs for evalling, except that the data lives in your repository instead of in the Langfuse platform.

https://ai.pydantic.dev/evals/

#llms #pydantic #genai #agents #claude #langfuse

Pydantic Evals - Pydantic AI

GenAI Agent Framework, the Pydantic way