@maxd @mattiem I imagine such a test harness is mostly just mocking the usual set of MCP services, but honestly feels pretty expensive to have a test suite that runs every time you want to improve the set of skills, especially if you want statistical confidence that an improvement was really made…