Commit 2b331d3
Skill quality dashboard: path adherence scoring, token tracking, and CI pipeline (#1102)
* Add skill quality dashboard: trace parser, path adherence, and CI pipeline step
- Add generate-quality-report.js: parses agent-metadata traces, computes
path adherence scores, generates skill-quality-report.json
- Update agent-runner.ts: capture token usage, LLM call metadata, and
tool call formatting for both skill: and tool: code blocks
- Add quality-report npm script to package.json
- Add 'Generate quality report' step to GitHub Actions workflow
- Define EXPECTED_PATHS for 10 skill areas based on real passing traces
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: add ESM __dirname polyfill to resource/create integration test
The test used bare __dirname which is not available in ESM modules.
Added fileURLToPath(import.meta.url) polyfill matching the pattern used
in jest.setup.ts and other test files.
Fixes 6 test failures: Workflow Documentation, Command Validation,
and References Pattern tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: resolve ESLint errors in generate-quality-report.js
- Add eslint-disable for unused utility functions (loadPerTestTokenUsage, extractTestCase)
- Use const instead of let for match variable
- Use double quotes instead of single quotes
- Remove unused testRunPath parameter from buildAreaSummaries
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: resolve CodeQL TOCTOU race condition in token summary write
Replace existsSync+readFileSync pattern with try/catch readFileSync
to eliminate time-of-check-time-of-use file system race condition
flagged by CodeQL as high severity.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: remove unused functions and token cost fields
- Remove loadPerTestTokenUsage and extractTestCase functions from
generate-quality-report.js (CodeQL unused-function alerts)
- Remove totalCost and per-call cost fields from TokenUsage interface
and all cost calculations in agent-runner.ts (not used in reports)
- Keep token count tracking (inputTokens, outputTokens, cache tokens)
Addresses all 3 CodeQL review comments from PR #1.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address Copilot PR review comments
- Remove unused path/fileURLToPath imports from integration.test.ts
(__dirname comes from jest.setup.ts global)
- Guard against -1 index in path adherence node lookup
- Wrap loadTokenSummary in try/catch for corrupted JSON resilience
- Switch token-summary from JSON to JSONL for safe concurrent writes
- Redact prompts in token-usage.json via redactSecrets()
- Initialize model default from modelOverride env var
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address all PR review comments from JasonYe and Copilot reviewer
- Write agent-metadata.json for machine consumption (comment #7)
- Rename apiCall to llmCall consistently (comment #8)
- Quality report step only runs for microsoft-foundry (comment #9)
- Add comprehensive JSDoc to buildTraces function (comment #10)
- Fix passRate: simplify math, return null when no tests (comment #11)
- Remove duplicate isSkillInvoked/getToolCalls, re-export from evaluate.ts (comments #12, #13)
- All 6 Copilot reviewer comments already addressed in prior commits
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address JasonYe review: remove markdown fallback, redact full JSON, fix event types
- Remove extractToolCallsFromMarkdown() entirely agent-metadata.json is the only source
- Remove markdown fallback in extractToolCalls() return empty if no JSON
- Apply redactSecrets() to full JSON text instead of just the prompt field
- Fix event type matching: use SDK's tool.execution_start with data.toolName
- Fix testedAreas using a.name instead of a.area (property name bug)
- Update JSDoc to remove stale markdown fallback reference
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent d4a201e commit 2b331d3
File tree
4 files changed
+1181
-3
lines changed- .github/workflows
- tests
- scripts
- utils
4 files changed
+1181
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
295 | 299 | | |
296 | 300 | | |
297 | 301 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
0 commit comments