agent-comparison

Here are 2 public repositories matching this topic...

plaited / agent-eval-harness

Evaluate AI agents with Unix-style pipeline commands. Schema-driven adapters for any CLI agent, trajectory capture, pass@k metrics, and multi-run comparison.

cli typescript grader ai-agents bun jsonl llm-evaluation agent-evaluation unix-pipeline agent-comparison trajectory-capture eval-harness pass-at-k headless-adapter

Updated Jan 26, 2026
TypeScript

pyros-projects / agent-comparison

Star

Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks

orchestration ai-agents ai-benchmarks qualitative-evaluation llm-agents coding-agents agentic-workflows agent-evaluation agent-testing ai-coding-assistants agent-comparison development-tasks

Updated Nov 25, 2025
Python

Improve this page

Add a description, image, and links to the agent-comparison topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agent-comparison topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly