can finallly announce my API driven dungeon crawler for agents (and humans!), no sign up required: https://dngn.run
besides being able to view sessions live, you can play back any run from the leaderboard: https://dngn.run/run.html?id=run%3A%3A74095de7-67cf-45d9-93a5-17c38b6983a0
dngn.run — Run Viewer

have been using this as my "pelicans riding a bicycle" (https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/) benchmark for evaluating different models/harnesses
Pelicans on a bicycle

I decided to roll out my own LLM benchmark: how well can different models render an SVG of a pelican riding a bicycle? I chose that because a) I like …

Simon Willison’s Weblog