Commit d38095e
Fix eval results page to show all 60 tests across 12 skills
Previously only 36 runs (9 skills) were embedded. Now correctly includes
all 120 runs (60 prompts × 2 configs) across all 12 GRC skills. Stats
recalculated from expectations arrays (bypassing buggy summary counts):
94% with skill, 83% baseline, +11%, 282/300 assertions, +32 additional.
Uses iteration-2 for ISO 27701 (improved skill, +20% delta).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent c8ff42a commit d38095e
1 file changed
+15921
-13812
lines changed
0 commit comments