GEPA optimizes prompts in compound AI systems by reading failed trajectories in natural language and editing the prompt of the module that caused the failure. Across six tasks it beats GRPO by 6% on average, up to 20%, with up to 35x fewer rollouts. Reflection extracts per-module diagnosis from a trajectory. GRPO collapses the same trajectory into one scalar and spreads it across every token.











