Evaluate AI agents with Unix-style pipeline commands. Schema-driven adapters for any CLI agent, trajectory capture, pass@k metrics, and multi-run comparison.
-
Updated
Jan 26, 2026 - TypeScript
Evaluate AI agents with Unix-style pipeline commands. Schema-driven adapters for any CLI agent, trajectory capture, pass@k metrics, and multi-run comparison.
Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks
Add a description, image, and links to the agent-comparison topic page so that developers can more easily learn about it.
To associate your repository with the agent-comparison topic, visit your repo's landing page and select "manage topics."