I used #Pydantic Evals to evaluate a bunch of agents today. After running an evaluation, I'd like to inspect the SpanTree for each evaluation case, e.g. to check which tools were called and debug my custom Evaluators. My current approach is a custom Evaluator that captures the tree as a side effect into a module-level variable.

Storing the trees in a global var is not great, so let's see if we can come up with a better solution: https://github.com/pydantic/pydantic-ai/issues/4758

#llms #evals #foss

Pydantic Evals: optionally storing traces to ReportCase for inspection after Dataset.evaluate() · Issue #4758 · pydantic/pydantic-ai

Hi Pydantic AI team! My usecase I'm using pydantic_evals to evaluate a bunch of long-running agents. After calling dataset.evaluate(), I would like to inspect the SpanTree for each case, e.g. to ch...

GitHub