Migrate example eval suites to suite.yaml by christso · Pull Request #1618 · EntityProcess/agentv

christso · 2026-07-03T08:39:53Z

Summary

Rename committed example suites from dataset.eval.yaml / generic eval.yaml to suite.yaml, with baseline files renamed to suite.baseline.jsonl and external row files renamed to cases.* where applicable.
Migrate example authoring from test-level criteria to explicit assert entries, normalize file-backed prompt references to file://..., and update README/docs references.
Add consistent suite.yaml discovery/validation support and baseline checking coverage; migrate example target files from removed name to label.
Add committed local OpenAI-compatible dogfood targets (pi-cli-openai, codex-sdk-openai, copilot-sdk-openai) and remove hard-deprecated cc-mirror target references.
Update .agentv/config.yaml to discover current suite names and auto-publish completed CLI/Dashboard result bundles to agentv/results/v1; remove legacy/default trace and OTel sidecar outputs from repo defaults.
Align project-local results: config with global ~/.agentv/config.yaml project entries: authored results config uses repo/path/branch/auto_push, and validation now rejects redundant results.mode while runtime loading tolerates legacy mode: github for compatibility.

Verification

bun run build
bun test apps/cli/test/commands/eval/shared.test.ts packages/core/test/evaluation/validation/file-type.test.ts packages/core/test/evaluation/category.test.ts packages/core/test/evaluation/providers/targets.test.ts packages/core/test/evaluation/validation/targets-validator.test.ts
bun test packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/validation/config-validator.test.ts packages/core/test/evaluation/results-repo.test.ts apps/cli/test/commands/results/remote-auto-export.test.ts (169 pass)
bun apps/cli/src/cli.ts validate examples .agentv/targets.yaml apps/cli/src/templates/.agentv/targets.yaml (109 valid, 0 invalid)
bun apps/cli/src/cli.ts validate .agentv/config.yaml (1 valid, 0 invalid)
git diff --check

Dogfood

Live local OpenAI-compatible endpoint model: gpt-5.3-codex-spark via LOCAL_OPENAI_PROXY_BASE_URL=http://127.0.0.1:10531/v1.
High-threshold live dogfood on ignored temporary suite .agentv/results/suite-yaml-live-dogfood-pass.yaml with --threshold 1:
- pi-cli-openai: 100% PASS, bundle .agentv/results/suite-yaml-live-pass-pi-cli-openai
- codex-sdk-openai: 100% PASS, bundle .agentv/results/suite-yaml-live-pass-codex-sdk-openai
- copilot-sdk-openai: 100% PASS after using default chat format, bundle .agentv/results/suite-yaml-live-pass-copilot-sdk-openai-chat
Results-branch publish proof: pi-cli-openai on examples/features/readme-quickstart/evals/my-eval.eval.yaml with --threshold 1 --results-require-push and no explicit --output generated timestamped run .agentv/results/2026-07-03T09-20-51-754Z, passed 100%, and pushed to agentv/results/v1:2026-07-03T09-20-51-754Z.
Explicit Pi + LLM grader proof: run 2026-07-03T09-20-51-754Z used agent target pi-cli-openai; its manifest contains two live llm-rubric score entries with target=local-openai-grader, scores 1.0, and 1 + 2 rubric assertions respectively.
Broad example execution smoke included renamed suites for suite-level input, basic JSONL/cases, external datasets, local CLI, batch CLI, tool trajectory, and workspace artifacts. batch-cli intentionally keeps its missing-output case per README; threshold-0 runs are recorded as smoke only, not correctness dogfood.

Notes

Private run bundles remain ignored under .agentv/results/ and are not committed.
Published run bundles go to the agentv/results/v1 branch through the existing Git-backed results publishing path.
The verification guide now states that threshold-0 execution is smoke coverage, not dogfood evidence.

Entire-Checkpoint: 4f8edb57e3d1

cloudflare-workers-and-pages · 2026-07-03T08:40:41Z

Deploying agentv with Cloudflare Pages

Latest commit:	`bdcd540`
Status:	✅ Deploy successful!
Preview URL:	https://edff7662.agentv.pages.dev
Branch Preview URL:	https://fix-suite-yaml-examples.agentv.pages.dev

View logs

fix(examples): migrate eval suites to suite yaml

648d03d

Entire-Checkpoint: 4f8edb57e3d1

christso added 5 commits July 3, 2026 10:59

chore(config): publish eval results to results branch

b679261

chore(config): align results config shape

0e425dd

chore(config): remove legacy trace file default

01f7b25

chore(config): simplify results and artifact defaults

7ea9699

ci: validate suite yaml examples directly

bdcd540

christso marked this pull request as ready for review July 3, 2026 11:56

christso merged commit f1b0c02 into main Jul 3, 2026
8 checks passed

christso deleted the fix/suite-yaml-examples branch July 3, 2026 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate example eval suites to suite.yaml#1618

Migrate example eval suites to suite.yaml#1618
christso merged 6 commits into
mainfrom
fix/suite-yaml-examples

christso commented Jul 3, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Dogfood

Notes

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Jul 3, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jul 3, 2026 •

edited

Loading