Mastodawn

AIagent.at 🤖 AI News

ServiceNow Research unveils EnterpriseOps-Gym, a new benchmark for evaluating AI agents in enterprise settings. The benchmark simulates 8 business domains with 164 database tables and 512 tools. Top models like Claude Opus 4.5 achieve just 37.4% success rate, revealing planning as the key bottleneck. https://www.marktechpost.com/2026/03/18/servicenow-research-introduces-enterpriseops-gym-a-high-fidelity-benchmark-designed-to-evaluate-agentic-planning-in-realistic-enterprise-settings/ #AIagent #AI #GenAI #AgenticAI #ServiceNow

ServiceNow Research Introduces EnterpriseOps-Gym: A High-Fidelity Benchmark Designed to Evaluate Agentic Planning in Realistic Enterprise Settings

ServiceNow Research Introduces EnterpriseOps-Gym: A Benchmark Designed to Evaluate Agentic Planning in Realistic Enterprise Settings

MarkTechPost