Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions agents/country-models/edge-case-generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,9 @@ For time-based rules:
benefit: 0 # Or reduced amount
```

### Bracket Boundary Consistency
When testing bracket boundaries, you do NOT need to test every threshold — test a few representative ones (first, one in the middle, last). But if you find that a boundary uses "above X%" (exclusive) semantics and needs a 0.0001 shift (see `/policyengine-parameter-patterns` — "Above X%" bracket boundaries), flag ALL thresholds in the same bracket — the boundary semantics applies consistently across the whole bracket, not just the one you tested.

## Auto-Generation Process

### Phase 1: Code Analysis
Expand Down
9 changes: 7 additions & 2 deletions agents/country-models/implementation-validator.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ This ensures you have the complete patterns and standards loaded for reference t
## Critical Violations (Automatic Rejection)

### 1. Hard-Coded Numeric Values
Any numeric literal (except 0, 1 for basic operations) must come from parameters:
Any numeric literal (except 0, 1, 2 for basic operations) must come from parameters:
- Thresholds, limits, amounts
- Percentages, rates, factors
- Dates, months, periods
Expand Down Expand Up @@ -171,6 +171,9 @@ Validate that:
- All variables with formulas have tests
- References trace to real documents
- No orphaned files
- No empty directories in the program folder (leftover from branch switches or restructuring).
Run: `find policyengine_us/{parameters,variables}/gov/states/{ST}/ -type d -empty`
Delete any found — git doesn't track empty directories and they cause confusion.

**CRITICAL: Parameter Usage Validation**
- Every parameter file MUST be used by at least one variable
Expand Down Expand Up @@ -349,7 +352,9 @@ The validator produces a **structured report with specific fixes** that ci-fixer
return where(eligible, benefit_amount, 0)
```

### Hard-Coded Values (need parameters)
### Hard-Coded Values — CRITICAL (need parameters)
Every hard-coded numeric value (except 0, 1, 2) is CRITICAL severity — no exceptions for ages,
thresholds, rates, or counts. Do NOT downgrade to moderate/warning.
| File | Line | Value | Create Parameter |
|------|------|-------|------------------|
| benefit.py | 23 | 0.3 | `benefit_rate.yaml` with value 0.3 |
Expand Down
9 changes: 9 additions & 0 deletions agents/country-models/parameter-architect.md
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,15 @@ Before finalizing, validate your work against ALL loaded skills:

Run through each skill's Quick Checklist if available.

## Scope Boundary

**You create PARAMETER YAML files ONLY.** Do NOT create:
- Variable `.py` files — the rules-engineer agent handles these
- Test `.yaml` files — the test-creator agent handles these
- Enum classes or any Python code

Even if you know what the variables and tests should look like, stay in your lane. Other agents are specialized for those tasks and will produce higher-quality output. If you create files outside your scope, the orchestrator may skip those specialized agents, degrading overall quality.

## Quality Standards

Parameters must have:
Expand Down
3 changes: 2 additions & 1 deletion agents/country-models/test-creator.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,4 +147,5 @@ Tests must:
- Include edge cases at thresholds
- Document calculation steps in comments
- Cover all eligibility paths
- Use only existing PolicyEngine variables
- Use only existing PolicyEngine variables
- NOT exhaustively test every entry in a lookup table — for brackets indexed by household size, FPL tier, etc., test a few representative points (first, middle, last) not every value
11 changes: 10 additions & 1 deletion agents/reference-validator.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,18 @@ reference:

**Flag as CRITICAL if:**
- Clicking link doesn't show the value
- Section number too vague (missing subsections)
- Section number is WRONG (title cites a section that doesn't contain the parameter value — e.g., citing `4.3.1(B)` when the value is actually in `4.3.1(A)(4)`). A wrong section citation is worse than a missing one because it points readers to incorrect regulatory text.
- PDF missing page number

**When proposing a corrected citation**, verify the replacement against the PDF text or
extracted text file. Search for the exact section/definition heading in the text — do not
guess the correct number from memory or nearby context. If you cannot confirm the correct
citation from the source text, flag it as "WRONG — correct section unknown, manual lookup
required" rather than proposing an unverified replacement.

**Flag as WARNING if:**
- Section number too vague (missing subsections) but still within the correct provision

### Phase 3: Check Corroboration

**The reference must explicitly support the value.**
Expand Down
1 change: 1 addition & 0 deletions changelog.d/ri-ccap-lessons.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve severity classification, agent scope boundaries, and test quality rules based on RI CCAP implementation lessons
27 changes: 25 additions & 2 deletions commands/encode-policy-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,16 +127,32 @@ Find a similar program in the codebase (e.g., CO CCAP for RI CCAP, DC TANF for O
Search with: Glob 'policyengine_us/variables/gov/states/*/[agency]/{prog}/*.py'
Read 3-5 variable files and 3-5 parameter files from the reference implementation.

STEP 2: Write THREE files:
STEP 2: Discover existing reusable variables.
For each key concept in the program (income, hours, age, household size, childcare, etc.),
Grep the codebase for related variables:
Grep 'class.*{concept}.*Variable' policyengine_us/variables/
For example, a childcare program should search for 'childcare', 'hours', 'provider', 'care'.
List all discovered variables in the impl-spec under '## Existing Variables to Reuse' so
downstream agents know what's already available and don't recreate them as bare inputs.

STEP 3: Verify citations against PDF text.
You already have the full documentation loaded. Before writing the impl-spec, cross-check
every section citation (statute number, manual section, definition number) against the
actual PDF text. Search the extracted text for the exact section/definition heading.
If a citation doesn't match (e.g., text says 'Definition 18' but you wrote 'Definition 19'),
correct it NOW — downstream agents will copy these citations verbatim into parameter files.

STEP 4: Write THREE files:

FILE 1: /tmp/{PREFIX}-impl-spec.md (FULL — for implementation agents)
- Every requirement from documentation, numbered (REQ-001, REQ-002, ...)
- Each requirement tagged: ELIGIBILITY, INCOME, BENEFIT, EXEMPTION, DEMOGRAPHIC, IMMIGRATION, RESOURCE, etc.
- Suggested variable and parameter structure (based on reference impl patterns)
- Existing variables to reuse (from Step 2 discovery — variable name, entity, description)
- Income sources list (for sources.yaml parameter, NOT inline adds)
- Reference implementation paths to study
- For TANF: note simplified vs full approach recommendation
- For each requirement: cite the source (statute, manual section, page)
- For each requirement: cite the source (statute, manual section, page) — verified against PDF text in Step 2

FILE 2: /tmp/{PREFIX}-requirements-checklist.md (SHORT — for orchestrator, max 40 lines)
- One line per requirement:
Expand Down Expand Up @@ -322,6 +338,8 @@ RULES:

### Step 3B: Create Variables and Tests (Parallel)

**ORCHESTRATOR RULE: Always spawn BOTH agents below, even if the parameter-architect created variable or test files.** Each agent is specialized for its task. If a previous agent went out of scope and created files that aren't its responsibility, the specialized agent will overwrite or improve them. Never skip an agent because "the files already exist."

After parameters are complete, spawn both in parallel — they work on different folders.

**Agent: Rules Engineer**
Expand Down Expand Up @@ -642,6 +660,11 @@ Closes #{ISSUE_NUMBER}
## Not Modeled
{Excluded requirements with reasons}

## Historical Notes
If the regulatory reviewer or document-collector noted historical changes to any parameter
values (e.g., threshold increases, rate changes), document what changed and when, with
source links. This helps reviewers understand why parameters only start at a recent date.

## Files Added
{Tree structure of new files}

Expand Down
9 changes: 6 additions & 3 deletions commands/review-program.md
Original file line number Diff line number Diff line change
Expand Up @@ -756,9 +756,12 @@ TASK:
AND visual verification (Step 5D). Note REJECTED mismatches as 'investigated and cleared'.
4. Classify each finding:
- CRITICAL (Must Fix): regulatory mismatches, value mismatches (code-path confirmed + 600 DPI verified),
hard-coded values, missing/non-corroborating references, CI failures, incorrect formulas
- SHOULD ADDRESS: code pattern violations, missing edge case tests, naming conventions,
period usage errors, formatting issues (params & vars)
hard-coded values, missing/non-corroborating references, incorrect section citations
(reference title cites wrong section/subsection), CI failures, incorrect formulas,
formula variables with zero test coverage (no unit test at all),
non-functional tests (e.g., absolute_error_margin >= 1 on boolean outputs)
- SHOULD ADDRESS: code pattern violations, missing edge case tests for already-tested variables,
naming conventions, period usage errors, formatting issues (params & vars)
- SUGGESTIONS: documentation improvements, performance optimizations, code style

5. Write FULL report to /tmp/{PREFIX}-review-full-report.md (for archival/posting)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,61 @@ brackets:

**Real-world example:** Hawaii Food/Excise Tax Credit uses AGI brackets. The first threshold must be `-.inf` to correctly handle taxpayers with negative AGI (e.g., business losses).

### Bracket Boundary: "Above X%" Regulations

**CRITICAL: PolicyEngine's `single_amount` bracket uses "at or above threshold" logic.** A value exactly at the threshold gets that bracket's rate, not the previous bracket's. When a regulation says "above X%" (meaning X% itself belongs to the lower bracket), shift the threshold by `0.0001` to match.

**Example — co-payment tiers by FPL:**
```
Regulation says:
Level 0: Less than or equal to 100% → $0
Level 1: Above 100% up to and including 125% → 2%
Level 2: Above 125% up to and including 150% → 5%
Level 3: Above 150% up to and including 261% → 7%
```

**❌ WRONG — family at exactly 100% FPL gets 2% instead of 0%:**
```yaml
brackets:
- threshold:
2024-01-01: 0
amount:
2024-01-01: 0
- threshold:
2024-01-01: 1.0 # ❌ At 100% FPL → hits this bracket
amount:
2024-01-01: 0.02
```

**✅ CORRECT — shift by 0.0001 to encode "above X%":**
```yaml
brackets:
- threshold:
2024-01-01: 0
amount:
2024-01-01: 0
- threshold:
2024-01-01: 1.0001 # ✅ "Above 100%" → 100% stays in previous bracket
amount:
2024-01-01: 0.02
- threshold:
2024-01-01: 1.2501 # ✅ "Above 125%"
amount:
2024-01-01: 0.05
- threshold:
2024-01-01: 1.5001 # ✅ "Above 150%"
amount:
2024-01-01: 0.07
```

**When to apply the 0.0001 shift:**
- Regulation says "above X%" or "more than X%" (exclusive of the boundary)
- Apply consistently to ALL thresholds in the bracket, not just the first

**When NOT to shift:**
- Regulation says "at or above X%" or "X% or more" (inclusive — matches PolicyEngine's default)
- Regulation says "at least X%" (inclusive)

### Parameter Structure Transitions (Flat → Bracket)

**When a parameter changes structure over time** (e.g., a flat rate becomes a tiered/marginal rate in a later year), you CANNOT put both structures in a single YAML file. Instead, split into separate files with a boolean toggle.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,11 @@ policyengine_us/tests/policy/baseline/gov/states/[state]/[agency]/[program]/
- ❌ `2024-01-01` - Full dates NOT supported

### Error Margin
- Always use `absolute_error_margin: 0.1` after period line
- Allows for small floating-point differences in calculations
- **Never use 1** - a margin of 1 makes `true` (1) and `false` (0) indistinguishable
Choose the margin based on the output type:
- **Boolean outputs** (`true`/`false`, eligibility, flags): **no error margin at all** — booleans are exact, no rounding. Omit `absolute_error_margin` entirely.
- **Currency outputs** (benefits, income, amounts): `absolute_error_margin: 0.01`
- **Rate/percentage outputs**: `absolute_error_margin: 0.001`
- **Never use 1** — a margin of 1 makes `true` (1) and `false` (0) indistinguishable, rendering the test meaningless

### Naming Convention
- Files: `variable_name.yaml` (matches variable exactly)
Expand Down Expand Up @@ -674,6 +676,7 @@ When creating tests:
5. **Follow naming conventions** exactly
6. **Include edge cases** at thresholds
7. **Test realistic scenarios** not placeholders
8. **Don't exhaustively test lookup tables** — when a parameter is a bracket indexed by household size, FPL tier, or similar, test a few representative points (first, middle, last/max) not every entry. If the bracket mechanism works for sizes 1, 4, and 10, it works for size 7 too. The same applies to income brackets, age brackets, and any other parameterized lookup.

When optimizing test suites:
1. **Identify slow tests** - Profile with `pytest --durations=10`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -759,6 +759,57 @@ class il_tanf_assistance_unit_size(Variable):
]
```

### Enum Variables with Numeric Ranges Must Have Formulas

**When an enum's labels describe numeric ranges (e.g., "30+ hours/week", "20-29 hours/week"), it MUST be a formula variable, not a bare input.** The numeric breakpoints should be a bracket parameter, and the formula should derive the category from an existing numeric input variable.

**❌ WRONG — bare input enum, user picks manually:**
```python
class program_time_category(Variable):
value_type = Enum
possible_values = TimeCategory # "30+ hrs", "20-29 hrs", etc.
default_value = TimeCategory.FULL_TIME
# No formula — user must select
```

**✅ CORRECT — derived from a numeric input via bracket parameter:**
```yaml
# time_category.yaml — single_amount bracket maps numeric input to category index
brackets:
- threshold:
2024-01-01: 0
amount:
2024-01-01: 4 # QUARTER_TIME
- threshold:
2024-01-01: 10
amount:
2024-01-01: 3 # HALF_TIME
- threshold:
2024-01-01: 20
amount:
2024-01-01: 2 # THREE_QUARTER_TIME
- threshold:
2024-01-01: 30
amount:
2024-01-01: 1 # FULL_TIME
```
```python
class program_time_category(Variable):
value_type = Enum
possible_values = TimeCategory
default_value = TimeCategory.FULL_TIME

def formula(person, period, parameters):
hours = person("childcare_hours_per_week", period)
p = parameters(period).gov.states.xx.agency.program
return p.time_category.calc(hours)
```

Before creating any enum variable, check:
1. Do the labels describe numeric ranges or thresholds?
2. Does an existing input variable provide the numeric value? (Grep the codebase)
3. If yes to both → use a bracket parameter + formula, not a bare input

#### State Variables to AVOID Creating

For TANF implementations:
Expand Down