Skip to content

feat(eval): finalize promptfoo-aligned restructure#1610

Merged
christso merged 8 commits into
mainfrom
docs/av-kfik-readme-docs
Jul 3, 2026
Merged

feat(eval): finalize promptfoo-aligned restructure#1610
christso merged 8 commits into
mainfrom
docs/av-kfik-readme-docs

Conversation

@christso

@christso christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR finalizes the promptfoo-aligned eval restructure branch and folds in the cleanup decisions from av-dkn5:

  • replaces workspace lifetime with workspace.scope: suite | attempt
  • removes pooled workspace mode and the workspace clean/list commands
  • keeps --workspace-path / execution.workspace_path as the explicit static local override
  • removes feedback.json and the Dashboard human-review overlay surface
  • removes authored evaluators aliases, snake_case grader aliases, and root-level budget_usd
  • keeps structured g-eval / rubric criteria with per-criterion assertion rows in grading.json
  • updates README, public docs, ADRs, examples, and AI-facing skill references for the current contract

Target References

Eval YAML can reference configured targets by label:

target: codex-gpt5

And can define an eval-local variant:

target:
  extends: codex-gpt5
  label: codex-gpt5-high-reasoning
  reasoning_effort: high

extends references a label from .agentv/targets.yaml or targets.yaml; label is the local result/comparison name.

Validation

  • bun test packages/core/test/paths.test.ts packages/core/test/evaluation/workspace-config-parsing.test.ts packages/core/test/evaluation/workspace/setup.test.ts packages/core/test/evaluation/repo-schema-validation.test.ts packages/core/test/evaluation/orchestrator.test.ts packages/core/test/evaluation/extensions.test.ts apps/cli/test/eval.integration.test.ts apps/cli/test/commands/prepare/prepare.test.ts — 209 passed
  • bun test packages/core/test/evaluation/loaders/grader-parser.test.ts packages/core/test/evaluation/loaders/jsonl-parser.test.ts packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/eval-inline-experiment.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts apps/cli/test/commands/results/serve.test.ts apps/dashboard/src/lib/result-table.test.ts apps/cli/test/commands/trace/trace.test.ts — 564 passed
  • bun test packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/workspace/setup.test.ts packages/core/test/evaluation/loaders/eval-yaml-transpiler.test.ts — 135 passed
  • bun --filter @agentv/core typecheck
  • bun --filter agentv typecheck
  • bun --filter @agentv/dashboard build
  • bun --filter @agentv/web build
  • bun run validate:examples — 61 valid / 0 invalid
  • bun run lint
  • git diff --check

Live Dogfood

Live local OpenAI-compatible dogfood passed using http://127.0.0.1:10531/v1 with gpt-5.3-codex-spark as both target and grader path. The eval used target: local-openai from a temporary targets.yaml with label: entries and a type: g-eval assertion.

Result: PASS, 1/1 tests, mean score 100%.

Private evidence branch: EntityProcess/agentv-private:evidence/av-dkn5-eval-restructure-dogfood-2026-07-03
Commit: ceed547

The extracted grading-check.json shows g-eval ran through local-openai-grader and wrote criterion-level rows to grading.json.assertion_results and graders[].assertion_results.

Related

  • Bead: av-dkn5
  • Related parent: av-kfik.16

Compound Engineering
GPT--5

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: baa3e20
Status: ✅  Deploy successful!
Preview URL: https://47058a93.agentv.pages.dev
Branch Preview URL: https://docs-av-kfik-readme-docs.agentv.pages.dev

View logs

@christso christso changed the title feat(cli): adapt Agent Skills evals at CLI boundary feat(eval): finalize promptfoo-aligned restructure Jul 3, 2026
@christso christso force-pushed the docs/av-kfik-readme-docs branch from 445c171 to 3a8228d Compare July 3, 2026 02:32
@christso christso force-pushed the docs/av-kfik-readme-docs branch from 3a8228d to 4272ac4 Compare July 3, 2026 02:37
@christso christso force-pushed the docs/av-kfik-readme-docs branch from b352a9b to eb49cd4 Compare July 3, 2026 06:08
@christso christso marked this pull request as ready for review July 3, 2026 06:13
@christso christso merged commit 2feb6b9 into main Jul 3, 2026
8 checks passed
@christso christso deleted the docs/av-kfik-readme-docs branch July 3, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant