Merge pull request #106 from hua7450/ri-ccap-pipeline-lessons

hua7450 · web-flow · commit 7eab067f379d · 2026-03-05T16:02:31.000-05:00
Fix severity classification and agent scope from RI CCAP lessons
diff --git a/agents/country-models/edge-case-generator.md b/agents/country-models/edge-case-generator.md
@@ -149,6 +149,9 @@ For time-based rules:
     benefit: 0  # Or reduced amount
 ```
 
+### Bracket Boundary Consistency
+When testing bracket boundaries, you do NOT need to test every threshold — test a few representative ones (first, one in the middle, last). But if you find that a boundary uses "above X%" (exclusive) semantics and needs a 0.0001 shift (see `/policyengine-parameter-patterns` — "Above X%" bracket boundaries), flag ALL thresholds in the same bracket — the boundary semantics applies consistently across the whole bracket, not just the one you tested.
+
 ## Auto-Generation Process
 
 ### Phase 1: Code Analysis
diff --git a/agents/country-models/implementation-validator.md b/agents/country-models/implementation-validator.md
@@ -61,7 +61,7 @@ This ensures you have the complete patterns and standards loaded for reference t
 ## Critical Violations (Automatic Rejection)
 
 ### 1. Hard-Coded Numeric Values
-Any numeric literal (except 0, 1 for basic operations) must come from parameters:
+Any numeric literal (except 0, 1, 2 for basic operations) must come from parameters:
 - Thresholds, limits, amounts
 - Percentages, rates, factors
 - Dates, months, periods
@@ -171,6 +171,9 @@ Validate that:
 - All variables with formulas have tests
 - References trace to real documents
 - No orphaned files
+- No empty directories in the program folder (leftover from branch switches or restructuring).
+  Run: `find policyengine_us/{parameters,variables}/gov/states/{ST}/ -type d -empty`
+  Delete any found — git doesn't track empty directories and they cause confusion.
 
 **CRITICAL: Parameter Usage Validation**
 - Every parameter file MUST be used by at least one variable
@@ -349,7 +352,9 @@ The validator produces a **structured report with specific fixes** that ci-fixer
   return where(eligible, benefit_amount, 0)
   ```
 
-### Hard-Coded Values (need parameters)
+### Hard-Coded Values — CRITICAL (need parameters)
+Every hard-coded numeric value (except 0, 1, 2) is CRITICAL severity — no exceptions for ages,
+thresholds, rates, or counts. Do NOT downgrade to moderate/warning.
 | File | Line | Value | Create Parameter |
 |------|------|-------|------------------|
 | benefit.py | 23 | 0.3 | `benefit_rate.yaml` with value 0.3 |
diff --git a/agents/country-models/parameter-architect.md b/agents/country-models/parameter-architect.md
@@ -306,6 +306,15 @@ Before finalizing, validate your work against ALL loaded skills:
 
 Run through each skill's Quick Checklist if available.
 
+## Scope Boundary
+
+**You create PARAMETER YAML files ONLY.** Do NOT create:
+- Variable `.py` files — the rules-engineer agent handles these
+- Test `.yaml` files — the test-creator agent handles these
+- Enum classes or any Python code
+
+Even if you know what the variables and tests should look like, stay in your lane. Other agents are specialized for those tasks and will produce higher-quality output. If you create files outside your scope, the orchestrator may skip those specialized agents, degrading overall quality.
+
 ## Quality Standards
 
 Parameters must have:
diff --git a/agents/country-models/test-creator.md b/agents/country-models/test-creator.md
@@ -147,4 +147,5 @@ Tests must:
 - Include edge cases at thresholds
 - Document calculation steps in comments
 - Cover all eligibility paths
-- Use only existing PolicyEngine variables
+- Use only existing PolicyEngine variables
+- NOT exhaustively test every entry in a lookup table — for brackets indexed by household size, FPL tier, etc., test a few representative points (first, middle, last) not every value
diff --git a/agents/reference-validator.md b/agents/reference-validator.md
@@ -112,9 +112,18 @@ reference:
 
 **Flag as CRITICAL if:**
 - Clicking link doesn't show the value
-- Section number too vague (missing subsections)
+- Section number is WRONG (title cites a section that doesn't contain the parameter value — e.g., citing `4.3.1(B)` when the value is actually in `4.3.1(A)(4)`). A wrong section citation is worse than a missing one because it points readers to incorrect regulatory text.
 - PDF missing page number
 
+**When proposing a corrected citation**, verify the replacement against the PDF text or
+extracted text file. Search for the exact section/definition heading in the text — do not
+guess the correct number from memory or nearby context. If you cannot confirm the correct
+citation from the source text, flag it as "WRONG — correct section unknown, manual lookup
+required" rather than proposing an unverified replacement.
+
+**Flag as WARNING if:**
+- Section number too vague (missing subsections) but still within the correct provision
+
 ### Phase 3: Check Corroboration
 
 **The reference must explicitly support the value.**
diff --git a/changelog.d/ri-ccap-lessons.changed.md b/changelog.d/ri-ccap-lessons.changed.md
@@ -0,0 +1 @@
+Improve severity classification, agent scope boundaries, and test quality rules based on RI CCAP implementation lessons
diff --git a/commands/encode-policy-v2.md b/commands/encode-policy-v2.md
@@ -127,16 +127,32 @@ Find a similar program in the codebase (e.g., CO CCAP for RI CCAP, DC TANF for O
 Search with: Glob 'policyengine_us/variables/gov/states/*/[agency]/{prog}/*.py'
 Read 3-5 variable files and 3-5 parameter files from the reference implementation.
 
-STEP 2: Write THREE files:
+STEP 2: Discover existing reusable variables.
+For each key concept in the program (income, hours, age, household size, childcare, etc.),
+Grep the codebase for related variables:
+  Grep 'class.*{concept}.*Variable' policyengine_us/variables/
+For example, a childcare program should search for 'childcare', 'hours', 'provider', 'care'.
+List all discovered variables in the impl-spec under '## Existing Variables to Reuse' so
+downstream agents know what's already available and don't recreate them as bare inputs.
+
+STEP 3: Verify citations against PDF text.
+You already have the full documentation loaded. Before writing the impl-spec, cross-check
+every section citation (statute number, manual section, definition number) against the
+actual PDF text. Search the extracted text for the exact section/definition heading.
+If a citation doesn't match (e.g., text says 'Definition 18' but you wrote 'Definition 19'),
+correct it NOW — downstream agents will copy these citations verbatim into parameter files.
+
+STEP 4: Write THREE files:
 
 FILE 1: /tmp/{PREFIX}-impl-spec.md (FULL — for implementation agents)
 - Every requirement from documentation, numbered (REQ-001, REQ-002, ...)
 - Each requirement tagged: ELIGIBILITY, INCOME, BENEFIT, EXEMPTION, DEMOGRAPHIC, IMMIGRATION, RESOURCE, etc.
 - Suggested variable and parameter structure (based on reference impl patterns)
+- Existing variables to reuse (from Step 2 discovery — variable name, entity, description)
 - Income sources list (for sources.yaml parameter, NOT inline adds)
 - Reference implementation paths to study
 - For TANF: note simplified vs full approach recommendation
-- For each requirement: cite the source (statute, manual section, page)
+- For each requirement: cite the source (statute, manual section, page) — verified against PDF text in Step 2
 
 FILE 2: /tmp/{PREFIX}-requirements-checklist.md (SHORT — for orchestrator, max 40 lines)
 - One line per requirement:
@@ -322,6 +338,8 @@ RULES:
 
 ### Step 3B: Create Variables and Tests (Parallel)
 
+**ORCHESTRATOR RULE: Always spawn BOTH agents below, even if the parameter-architect created variable or test files.** Each agent is specialized for its task. If a previous agent went out of scope and created files that aren't its responsibility, the specialized agent will overwrite or improve them. Never skip an agent because "the files already exist."
+
 After parameters are complete, spawn both in parallel — they work on different folders.
 
 **Agent: Rules Engineer**
@@ -642,6 +660,11 @@ Closes #{ISSUE_NUMBER}
 ## Not Modeled
 {Excluded requirements with reasons}
 
+## Historical Notes
+If the regulatory reviewer or document-collector noted historical changes to any parameter
+values (e.g., threshold increases, rate changes), document what changed and when, with
+source links. This helps reviewers understand why parameters only start at a recent date.
+
 ## Files Added
 {Tree structure of new files}
 
diff --git a/commands/review-program.md b/commands/review-program.md
@@ -756,9 +756,12 @@ TASK:
    AND visual verification (Step 5D). Note REJECTED mismatches as 'investigated and cleared'.
 4. Classify each finding:
    - CRITICAL (Must Fix): regulatory mismatches, value mismatches (code-path confirmed + 600 DPI verified),
-     hard-coded values, missing/non-corroborating references, CI failures, incorrect formulas
-   - SHOULD ADDRESS: code pattern violations, missing edge case tests, naming conventions,
-     period usage errors, formatting issues (params & vars)
+     hard-coded values, missing/non-corroborating references, incorrect section citations
+     (reference title cites wrong section/subsection), CI failures, incorrect formulas,
+     formula variables with zero test coverage (no unit test at all),
+     non-functional tests (e.g., absolute_error_margin >= 1 on boolean outputs)
+   - SHOULD ADDRESS: code pattern violations, missing edge case tests for already-tested variables,
+     naming conventions, period usage errors, formatting issues (params & vars)
    - SUGGESTIONS: documentation improvements, performance optimizations, code style
 
 5. Write FULL report to /tmp/{PREFIX}-review-full-report.md (for archival/posting)
diff --git a/skills/technical-patterns/policyengine-parameter-patterns-skill/SKILL.md b/skills/technical-patterns/policyengine-parameter-patterns-skill/SKILL.md
@@ -467,6 +467,61 @@ brackets:
 
 **Real-world example:** Hawaii Food/Excise Tax Credit uses AGI brackets. The first threshold must be `-.inf` to correctly handle taxpayers with negative AGI (e.g., business losses).
 
+### Bracket Boundary: "Above X%" Regulations
+
+**CRITICAL: PolicyEngine's `single_amount` bracket uses "at or above threshold" logic.** A value exactly at the threshold gets that bracket's rate, not the previous bracket's. When a regulation says "above X%" (meaning X% itself belongs to the lower bracket), shift the threshold by `0.0001` to match.
+
+**Example — co-payment tiers by FPL:**
+```
+Regulation says:
+  Level 0: Less than or equal to 100% → $0
+  Level 1: Above 100% up to and including 125% → 2%
+  Level 2: Above 125% up to and including 150% → 5%
+  Level 3: Above 150% up to and including 261% → 7%
+```
+
+**❌ WRONG — family at exactly 100% FPL gets 2% instead of 0%:**
+```yaml
+brackets:
+  - threshold:
+      2024-01-01: 0
+    amount:
+      2024-01-01: 0
+  - threshold:
+      2024-01-01: 1.0    # ❌ At 100% FPL → hits this bracket
+    amount:
+      2024-01-01: 0.02
+```
+
+**✅ CORRECT — shift by 0.0001 to encode "above X%":**
+```yaml
+brackets:
+  - threshold:
+      2024-01-01: 0
+    amount:
+      2024-01-01: 0
+  - threshold:
+      2024-01-01: 1.0001  # ✅ "Above 100%" → 100% stays in previous bracket
+    amount:
+      2024-01-01: 0.02
+  - threshold:
+      2024-01-01: 1.2501  # ✅ "Above 125%"
+    amount:
+      2024-01-01: 0.05
+  - threshold:
+      2024-01-01: 1.5001  # ✅ "Above 150%"
+    amount:
+      2024-01-01: 0.07
+```
+
+**When to apply the 0.0001 shift:**
+- Regulation says "above X%" or "more than X%" (exclusive of the boundary)
+- Apply consistently to ALL thresholds in the bracket, not just the first
+
+**When NOT to shift:**
+- Regulation says "at or above X%" or "X% or more" (inclusive — matches PolicyEngine's default)
+- Regulation says "at least X%" (inclusive)
+
 ### Parameter Structure Transitions (Flat → Bracket)
 
 **When a parameter changes structure over time** (e.g., a flat rate becomes a tiered/marginal rate in a later year), you CANNOT put both structures in a single YAML file. Instead, split into separate files with a boolean toggle.
diff --git a/skills/technical-patterns/policyengine-testing-patterns-skill/SKILL.md b/skills/technical-patterns/policyengine-testing-patterns-skill/SKILL.md
@@ -24,9 +24,11 @@ policyengine_us/tests/policy/baseline/gov/states/[state]/[agency]/[program]/
 - ❌ `2024-01-01` - Full dates NOT supported
 
 ### Error Margin
-- Always use `absolute_error_margin: 0.1` after period line
-- Allows for small floating-point differences in calculations
-- **Never use 1** - a margin of 1 makes `true` (1) and `false` (0) indistinguishable
+Choose the margin based on the output type:
+- **Boolean outputs** (`true`/`false`, eligibility, flags): **no error margin at all** — booleans are exact, no rounding. Omit `absolute_error_margin` entirely.
+- **Currency outputs** (benefits, income, amounts): `absolute_error_margin: 0.01`
+- **Rate/percentage outputs**: `absolute_error_margin: 0.001`
+- **Never use 1** — a margin of 1 makes `true` (1) and `false` (0) indistinguishable, rendering the test meaningless
 
 ### Naming Convention
 - Files: `variable_name.yaml` (matches variable exactly)
@@ -674,6 +676,7 @@ When creating tests:
 5. **Follow naming conventions** exactly
 6. **Include edge cases** at thresholds
 7. **Test realistic scenarios** not placeholders
+8. **Don't exhaustively test lookup tables** — when a parameter is a bracket indexed by household size, FPL tier, or similar, test a few representative points (first, middle, last/max) not every entry. If the bracket mechanism works for sizes 1, 4, and 10, it works for size 7 too. The same applies to income brackets, age brackets, and any other parameterized lookup.
 
 When optimizing test suites:
 1. **Identify slow tests** - Profile with `pytest --durations=10`
diff --git a/skills/technical-patterns/policyengine-variable-patterns-skill/SKILL.md b/skills/technical-patterns/policyengine-variable-patterns-skill/SKILL.md
@@ -759,6 +759,57 @@ class il_tanf_assistance_unit_size(Variable):
     ]
 ```
 
+### Enum Variables with Numeric Ranges Must Have Formulas
+
+**When an enum's labels describe numeric ranges (e.g., "30+ hours/week", "20-29 hours/week"), it MUST be a formula variable, not a bare input.** The numeric breakpoints should be a bracket parameter, and the formula should derive the category from an existing numeric input variable.
+
+**❌ WRONG — bare input enum, user picks manually:**
+```python
+class program_time_category(Variable):
+    value_type = Enum
+    possible_values = TimeCategory  # "30+ hrs", "20-29 hrs", etc.
+    default_value = TimeCategory.FULL_TIME
+    # No formula — user must select
+```
+
+**✅ CORRECT — derived from a numeric input via bracket parameter:**
+```yaml
+# time_category.yaml — single_amount bracket maps numeric input to category index
+brackets:
+  - threshold:
+      2024-01-01: 0
+    amount:
+      2024-01-01: 4  # QUARTER_TIME
+  - threshold:
+      2024-01-01: 10
+    amount:
+      2024-01-01: 3  # HALF_TIME
+  - threshold:
+      2024-01-01: 20
+    amount:
+      2024-01-01: 2  # THREE_QUARTER_TIME
+  - threshold:
+      2024-01-01: 30
+    amount:
+      2024-01-01: 1  # FULL_TIME
+```
+```python
+class program_time_category(Variable):
+    value_type = Enum
+    possible_values = TimeCategory
+    default_value = TimeCategory.FULL_TIME
+
+    def formula(person, period, parameters):
+        hours = person("childcare_hours_per_week", period)
+        p = parameters(period).gov.states.xx.agency.program
+        return p.time_category.calc(hours)
+```
+
+Before creating any enum variable, check:
+1. Do the labels describe numeric ranges or thresholds?
+2. Does an existing input variable provide the numeric value? (Grep the codebase)
+3. If yes to both → use a bracket parameter + formula, not a bare input
+
 #### State Variables to AVOID Creating
 
 For TANF implementations:

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Improve severity classification, agent scope boundaries, and test quality rules based on RI CCAP implementation lessons`