feat(eval): finalize promptfoo-aligned restructure by christso · Pull Request #1610 · EntityProcess/agentv

christso · 2026-07-02T23:36:00Z

Summary

This PR finalizes the promptfoo-aligned eval restructure branch and folds in the cleanup decisions from av-dkn5:

replaces workspace lifetime with workspace.scope: suite | attempt
removes pooled workspace mode and the workspace clean/list commands
keeps --workspace-path / execution.workspace_path as the explicit static local override
removes feedback.json and the Dashboard human-review overlay surface
removes authored evaluators aliases, snake_case grader aliases, and root-level budget_usd
keeps structured g-eval / rubric criteria with per-criterion assertion rows in grading.json
updates README, public docs, ADRs, examples, and AI-facing skill references for the current contract

Target References

Eval YAML can reference configured targets by label:

target: codex-gpt5

And can define an eval-local variant:

target:
  extends: codex-gpt5
  label: codex-gpt5-high-reasoning
  reasoning_effort: high

extends references a label from .agentv/targets.yaml or targets.yaml; label is the local result/comparison name.

Validation

bun test packages/core/test/paths.test.ts packages/core/test/evaluation/workspace-config-parsing.test.ts packages/core/test/evaluation/workspace/setup.test.ts packages/core/test/evaluation/repo-schema-validation.test.ts packages/core/test/evaluation/orchestrator.test.ts packages/core/test/evaluation/extensions.test.ts apps/cli/test/eval.integration.test.ts apps/cli/test/commands/prepare/prepare.test.ts — 209 passed
bun test packages/core/test/evaluation/loaders/grader-parser.test.ts packages/core/test/evaluation/loaders/jsonl-parser.test.ts packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/eval-inline-experiment.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts apps/cli/test/commands/results/serve.test.ts apps/dashboard/src/lib/result-table.test.ts apps/cli/test/commands/trace/trace.test.ts — 564 passed
bun test packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/workspace/setup.test.ts packages/core/test/evaluation/loaders/eval-yaml-transpiler.test.ts — 135 passed
bun --filter @agentv/core typecheck
bun --filter agentv typecheck
bun --filter @agentv/dashboard build
bun --filter @agentv/web build
bun run validate:examples — 61 valid / 0 invalid
bun run lint
git diff --check

Live Dogfood

Live local OpenAI-compatible dogfood passed using http://127.0.0.1:10531/v1 with gpt-5.3-codex-spark as both target and grader path. The eval used target: local-openai from a temporary targets.yaml with label: entries and a type: g-eval assertion.

Result: PASS, 1/1 tests, mean score 100%.

Private evidence branch: EntityProcess/agentv-private:evidence/av-dkn5-eval-restructure-dogfood-2026-07-03
Commit: ceed547

The extracted grading-check.json shows g-eval ran through local-openai-grader and wrote criterion-level rows to grading.json.assertion_results and graders[].assertion_results.

Deploying agentv with Cloudflare Pages

Latest commit:	`baa3e20`
Status:	✅ Deploy successful!
Preview URL:	https://47058a93.agentv.pages.dev
Branch Preview URL:	https://docs-av-kfik-readme-docs.agentv.pages.dev

View logs

christso changed the title ~~feat(cli): adapt Agent Skills evals at CLI boundary~~ feat(eval): finalize promptfoo-aligned restructure Jul 3, 2026

christso added 2 commits July 3, 2026 04:29

feat(cli): adapt Agent Skills evals at CLI boundary

7c3cc91

style: format Agent Skills adapter changes

d1166a0

christso force-pushed the docs/av-kfik-readme-docs branch from 445c171 to 3a8228d Compare July 3, 2026 02:32

feat(eval): finalize promptfoo-aligned restructure

4272ac4

christso force-pushed the docs/av-kfik-readme-docs branch from 3a8228d to 4272ac4 Compare July 3, 2026 02:37

christso added 4 commits July 3, 2026 06:00

refactor(evals): fold structured rubrics into llm-rubric

2507f33

refactor(evals): align authored assert fields

c27e4b8

feat(eval): support ref-backed default tests

88dcb3b

fix(eval): migrate remaining assert fixtures

eb49cd4

christso force-pushed the docs/av-kfik-readme-docs branch from b352a9b to eb49cd4 Compare July 3, 2026 06:08

christso marked this pull request as ready for review July 3, 2026 06:13

fix eval assert validation and docs

baa3e20

christso merged commit 2feb6b9 into main Jul 3, 2026
8 checks passed

christso deleted the docs/av-kfik-readme-docs branch July 3, 2026 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eval): finalize promptfoo-aligned restructure#1610

feat(eval): finalize promptfoo-aligned restructure#1610
christso merged 8 commits into
mainfrom
docs/av-kfik-readme-docs

christso commented Jul 2, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Target References

Validation

Live Dogfood

Related

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Jul 2, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading