You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix microsim skill: warn against np.array() and entity-level mismatches (#107)
Real bug encountered: np.array() on MicroSeries strips entity context,
allowing silent mismatches between tax_unit (23K rows) and household
(15K rows) arrays. Boolean mask from one entity applied to weights from
another gives silently wrong counts (showed 1K losers instead of 719K).
Changes:
- Expand CRITICAL section to warn against np.array() specifically (not
just .values), explaining entity mismatch as the primary danger
- Add entity-level matching section with wrong/right examples
- Note that household_net_income includes state tax effects and add
federal-only pattern using income_tax for scoring federal bills
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## CRITICAL: Use calc() with MicroSeries - No Manual Weights Ever
28
+
## CRITICAL: Use calc() with MicroSeries — never use np.array() or manual weights
29
29
30
-
**MicroSeries handles all weighting automatically. Never access .weights or do manual weight math.**
30
+
**MicroSeries handles all weighting automatically. Never convert to numpy or do manual weight math.**
31
31
32
-
### NEVER strip weights with .values
32
+
### NEVER convert MicroSeries to numpy arrays
33
33
34
-
`calc()` and `calculate()` return MicroSeries with embedded weights. Calling `.values` strips them and returns a plain numpy array where `.mean()` is **unweighted**.
34
+
`calc()` and `calculate()` return MicroSeries with embedded weights AND entity context. Converting to numpy via `np.array()`, `.values`, or `.to_numpy()` strips both, causing:
35
+
1.**Unweighted results** — `.mean()` on a numpy array is unweighted
36
+
2.**Entity-level mismatches** — mixing arrays from different entities (e.g., 23K tax units vs 15K households) gives silently wrong results. Numpy won't error because boolean masks still index, but the mask from one entity applied to values from another is garbage.
35
37
36
38
```python
37
-
# ❌ WRONG - .values strips weights, .mean() is UNWEIGHTED
39
+
# ❌ WRONG - np.array() strips weights AND entity context
0 commit comments