Skip to content

Eval-driven improvements to technical-patterns skills#120

Merged
hua7450 merged 1 commit intoPolicyEngine:mainfrom
hua7450:eval-driven-skill-improvements
Mar 9, 2026
Merged

Eval-driven improvements to technical-patterns skills#120
hua7450 merged 1 commit intoPolicyEngine:mainfrom
hua7450:eval-driven-skill-improvements

Conversation

@hua7450
Copy link
Collaborator

@hua7450 hua7450 commented Mar 9, 2026

Summary

Improves 3 technical-patterns skills based on a full eval run of the encode-policy-v2 command on Rhode Island CCAP (Child Care Assistance Program).

Eval Methodology

  1. Researched RI CCAP regulations (218-RICR-20-00-4) and provider payment rates
  2. Spawned rules-engineer agents via agent teams to create parameters (55 files) and variables (15 files)
  3. Reviewed output against reference branch and regulatory sources
  4. Identified 12 issues, traced each to a skill gap, and fixed them

Issues Found and Fixed

# Issue Skill Fixed
1 Wrong PDF page numbers, should prefer online refs (Cornell LII) parameter-patterns
2 Missing 0.0001 bracket boundary shift despite being documented parameter-patterns
3 Orphaned parameters (created but unused by variables) parameter-patterns
4 Unnecessary parameter (min_age_weeks — no simulation value) parameter-patterns
5 Star rating as int instead of Enum variable-patterns
6 Missing child age category Enum variable variable-patterns
7 weeks_per_month parameter instead of framework constants code-style
8 Over-granular rate files (45 files → 3 with Enum breakdowns) parameter-patterns
9 Helper functions in rate lookup (sign of bad param structure) variable-patterns
10 Didn't follow reference implementation patterns deeply enough parameter-patterns
11 Authorization level should be Enum breakdown dimension parameter-patterns + variable-patterns
12 Descriptions too literal — copied regulatory language verbatim parameter-patterns

Changes by Skill

parameter-patterns (+121 lines):

  • Reference source hierarchy: online statute > government HTML > PDF with verified page number
  • Strengthened bracket boundary 0.0001 shift with explicit 4-step BOUNDARY CHECK
  • New "Multi-Dimensional Rate Tables" section with full Enum breakdown example
  • New "Don't Create Unnecessary Parameters" section
  • Strengthened "Study Reference Implementations First" with broad search patterns
  • Added "Keep Descriptions Practical" guidance

variable-patterns (+45 lines):

  • New "Rate Table Dimensions: One Enum Variable Per Dimension" section
  • Guidance on separate Enums when dimensions differ by category
  • Enum vs int: use Enum for constrained categories, int only for arithmetic
  • Helper function warning: restructure parameters instead

code-style (+14 lines):

  • Framework constants guidance: use WEEKS_IN_YEAR / MONTHS_IN_YEAR, not custom parameters

Test plan

  • Verify skill files parse correctly (valid YAML frontmatter)
  • Review new examples for accuracy against PolicyEngine patterns
  • Consider a re-run of RI CCAP eval to verify improvements

🤖 Generated with Claude Code

Based on a full RI CCAP eval run through rules-engineer agents,
identified and fixed 12 skill gaps:

parameter-patterns:
- Add reference source hierarchy (online > PDF)
- Strengthen bracket boundary 0.0001 shift with checklist
- Add multi-dimensional Enum breakdown pattern for rate tables
- Add guidance against unnecessary parameters
- Strengthen reference implementation study instructions
- Add practical description guidance

variable-patterns:
- Add rate table Enum dimension pattern
- Add Enum vs int guidance for constrained categories
- Add helper function warning (restructure params instead)

code-style:
- Add framework constants guidance (WEEKS_IN_YEAR, MONTHS_IN_YEAR)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hua7450 hua7450 merged commit a08b612 into PolicyEngine:main Mar 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant