Skip to content

Commit 6f74d5e

Browse files
committed
docs: add multi-agent workflow examples to AGENTS.md
- Document agent:codex, --runs flag, export:leaderboard commands
1 parent e62ab43 commit 6f74d5e

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

AGENTS.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,30 @@ bun start --skills --model "claude-sonnet-4-5"
4242
BRAINTRUST_API_KEY=sk-... bun report:braintrust
4343
```
4444

45+
### Agent Evaluations (Multi-Agent)
46+
47+
```bash
48+
# Run agent evals with Claude Code
49+
bun agent:claude
50+
bun agent:claude --eval add-auth --debug
51+
52+
# Run agent evals with Codex
53+
bun agent:codex
54+
bun agent:codex --eval add-auth --debug
55+
56+
# Multi-trial (3 runs per eval, pass@k metrics)
57+
bun agent:claude --runs 3
58+
bun agent:codex --runs 3
59+
60+
# With skills or MCP
61+
bun agent:claude --skills
62+
bun agent:claude --skills --mcp
63+
64+
# Cross-agent leaderboard export
65+
bun export:leaderboard
66+
bun export:leaderboard --since 2026-03-20
67+
```
68+
4569
## Project Structure & Module Organization
4670
`src/index.ts` wires providers, runners, reporters, and every folder under `src/evals`. Keep each evaluation in its own directory with `PROMPT.md`, `graders.ts`, and any fixtures it needs. Use descriptive, numeric-free slugs like `src/evals/new-eval`. Runner logic lives in `src/runners`, shared provider clients in `src/providers`, scoring helpers in `src/scorers`, and reusable utilities in `src/utils`. Diagrams intended for contributor onboarding belong in `docs/`, while transient artifacts like `scores.json` stay gitignored at the root.
4771

0 commit comments

Comments
 (0)