Skip to content

Commit 2b331d3

Browse files
Skill quality dashboard: path adherence scoring, token tracking, and CI pipeline (#1102)
* Add skill quality dashboard: trace parser, path adherence, and CI pipeline step - Add generate-quality-report.js: parses agent-metadata traces, computes path adherence scores, generates skill-quality-report.json - Update agent-runner.ts: capture token usage, LLM call metadata, and tool call formatting for both skill: and tool: code blocks - Add quality-report npm script to package.json - Add 'Generate quality report' step to GitHub Actions workflow - Define EXPECTED_PATHS for 10 skill areas based on real passing traces Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: add ESM __dirname polyfill to resource/create integration test The test used bare __dirname which is not available in ESM modules. Added fileURLToPath(import.meta.url) polyfill matching the pattern used in jest.setup.ts and other test files. Fixes 6 test failures: Workflow Documentation, Command Validation, and References Pattern tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: resolve ESLint errors in generate-quality-report.js - Add eslint-disable for unused utility functions (loadPerTestTokenUsage, extractTestCase) - Use const instead of let for match variable - Use double quotes instead of single quotes - Remove unused testRunPath parameter from buildAreaSummaries Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: resolve CodeQL TOCTOU race condition in token summary write Replace existsSync+readFileSync pattern with try/catch readFileSync to eliminate time-of-check-time-of-use file system race condition flagged by CodeQL as high severity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: remove unused functions and token cost fields - Remove loadPerTestTokenUsage and extractTestCase functions from generate-quality-report.js (CodeQL unused-function alerts) - Remove totalCost and per-call cost fields from TokenUsage interface and all cost calculations in agent-runner.ts (not used in reports) - Keep token count tracking (inputTokens, outputTokens, cache tokens) Addresses all 3 CodeQL review comments from PR #1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot PR review comments - Remove unused path/fileURLToPath imports from integration.test.ts (__dirname comes from jest.setup.ts global) - Guard against -1 index in path adherence node lookup - Wrap loadTokenSummary in try/catch for corrupted JSON resilience - Switch token-summary from JSON to JSONL for safe concurrent writes - Redact prompts in token-usage.json via redactSecrets() - Initialize model default from modelOverride env var Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address all PR review comments from JasonYe and Copilot reviewer - Write agent-metadata.json for machine consumption (comment #7) - Rename apiCall to llmCall consistently (comment #8) - Quality report step only runs for microsoft-foundry (comment #9) - Add comprehensive JSDoc to buildTraces function (comment #10) - Fix passRate: simplify math, return null when no tests (comment #11) - Remove duplicate isSkillInvoked/getToolCalls, re-export from evaluate.ts (comments #12, #13) - All 6 Copilot reviewer comments already addressed in prior commits Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address JasonYe review: remove markdown fallback, redact full JSON, fix event types - Remove extractToolCallsFromMarkdown() entirely agent-metadata.json is the only source - Remove markdown fallback in extractToolCalls() return empty if no JSON - Apply redactSecrets() to full JSON text instead of just the prompt field - Fix event type matching: use SDK's tool.execution_start with data.toolName - Fix testedAreas using a.name instead of a.area (property name bug) - Update JSDoc to remove stale markdown fallback reference Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent d4a201e commit 2b331d3

File tree

4 files changed

+1181
-3
lines changed

4 files changed

+1181
-3
lines changed

.github/workflows/test-all-integration.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,10 @@ jobs:
292292
echo "No skill report found"
293293
fi
294294
295+
- name: Generate quality report
296+
if: always() && matrix.skill == 'microsoft-foundry'
297+
run: npm run quality-report || true
298+
295299
- name: Export report
296300
if: always()
297301
id: export-report

tests/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
"coverage:grid": "node scripts/generate-coverage-grid.js",
1717
"report": "npx tsx scripts/generate-test-reports.ts",
1818
"results": "node scripts/show-test-results.js",
19+
"quality-report": "node scripts/generate-quality-report.js",
1920
"update:snapshots": "node scripts/update-snapshots.js",
2021
"typecheck": "tsc --noEmit",
2122
"lint": "eslint",

0 commit comments

Comments
 (0)