Skip to content

Commit f7c9d8d

Browse files
committed
Boilerplate update, move to AGENTS.md
1 parent 199bda4 commit f7c9d8d

File tree

6 files changed

+317
-22191
lines changed

6 files changed

+317
-22191
lines changed

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ repos:
66
- id: check-useless-excludes
77
# - id: identity # Prints all files passed to pre-commits. Debugging.
88
- repo: https://github.com/tox-dev/pyproject-fmt
9-
rev: v2.16.2
9+
rev: v2.18.1
1010
hooks:
1111
- id: pyproject-fmt
1212
- repo: https://github.com/lyz-code/yamlfix
@@ -19,11 +19,13 @@ repos:
1919
- id: check-added-large-files
2020
args:
2121
- --maxkb=10000
22+
- id: check-ast
2223
- id: check-case-conflict
24+
- id: check-docstring-first
2325
- id: check-merge-conflict
26+
- id: check-toml
2427
- id: check-vcs-permalinks
2528
- id: check-yaml
26-
- id: check-toml
2729
- id: debug-statements
2830
- id: end-of-file-fixer
2931
- id: fix-byte-order-marker
@@ -38,8 +40,6 @@ repos:
3840
- --branch
3941
- main
4042
- id: trailing-whitespace
41-
- id: check-ast
42-
- id: check-docstring-first
4343
- repo: https://github.com/adrienverge/yamllint.git
4444
rev: v1.38.0
4545
hooks:

AGENTS.md

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
@.ai-instructions/profiles/tier-a.md @.ai-instructions/modules/jax.md
2+
@.ai-instructions/modules/pandas.md
3+
4+
# CLAUDE.md
5+
6+
This file provides guidance to Claude Code (claude.ai/code) when working with code in
7+
this repository.
8+
9+
## Project Overview
10+
11+
GETTSIM (German Taxes and Transfers SIMulator) is a Python microsimulation model for the
12+
German tax and transfer system. It enables research applications from dynamic
13+
programming models to detailed microsimulation studies.
14+
15+
The core computation engine is provided by `ttsim-backend`. GETTSIM contains the policy
16+
definitions, parameters, and tests specific to Germany.
17+
18+
Never work around limitations in `ttsim-backend`; any such changes should be made there.
19+
20+
## Common Commands
21+
22+
```bash
23+
# Run all tests
24+
pixi run -e py314-jax tests -n 7
25+
26+
# Run a single test file
27+
pixi run -e py314-jax tests src/gettsim/tests_germany/test_policy_cases.py
28+
29+
# Run tests for a specific policy area (by test ID pattern)
30+
pixi run -e py314-jax tests -k "kindergeld"
31+
32+
# Type checking
33+
pixi run ty
34+
pixi run ty-jax
35+
36+
# Quality checks (linting, formatting)
37+
pixi run prek run --all-files
38+
39+
# Build documentation
40+
pixi run docs
41+
```
42+
43+
Before finishing any task that modifies code, always run these three verification steps
44+
in order:
45+
46+
1. `pixi run ty` and `pixi run ty-jax` (type checker)
47+
1. `pixi run prek run --all-files` (quality checks: linting, formatting, yaml, etc.)
48+
1. `pixi run -e py314-jax tests -n 7` (full test suite)
49+
50+
## Architecture
51+
52+
### Source Layout
53+
54+
- `src/gettsim/germany/` - Policy implementations organized by area (einkommensteuer,
55+
kindergeld, bürgergeld, etc.)
56+
- `src/gettsim/tests_germany/policy_cases/` - Test cases organized by policy area and
57+
date
58+
- `src/gettsim/tt/` - Re-exports from ttsim-backend (decorators, types)
59+
60+
### Two-Level DAG System (GEP 4, 7)
61+
62+
1. **Interface DAG**: High-level orchestration connecting inputs to outputs via
63+
`main()`. Key concepts:
64+
65+
- `policy_date`: Date for which the policy environment is set up
66+
- `policy_environment`: Functions/parameters relevant at the policy date
67+
- `input_data`: User-provided data (via DataFrame + mapper or direct pytree)
68+
- `tt_targets`: Which outputs to compute
69+
- `results`: Final outputs in user-requested format
70+
71+
1. **TT DAG**: The core computation layer. Contains policy functions that operate on
72+
data columns.
73+
74+
### Entry Point (GEP 7)
75+
76+
`gettsim.main()` is the single entry point. Users specify:
77+
78+
- `main_target` or `main_targets`: What to compute (use `MainTarget` for autocompletion)
79+
- `policy_date_str`: Date for the policy environment (ISO format `YYYY-MM-DD`)
80+
- `input_data`: User data (via `InputData` helper classes)
81+
- `tt_targets`: Which tax/transfer outputs to compute (via `TTTargets`)
82+
83+
```python
84+
from gettsim import InputData, MainTarget, TTTargets, main
85+
86+
outputs_df = main(
87+
main_target=MainTarget.results.df_with_mapper,
88+
policy_date_str="2025-01-01",
89+
input_data=InputData.df_and_mapper(df=inputs_df, mapper=inputs_map),
90+
tt_targets=TTTargets(tree=targets_tree),
91+
)
92+
```
93+
94+
### Policy Functions (GEP 4, 6)
95+
96+
Policy functions use decorators from `gettsim.tt`:
97+
98+
```python
99+
@policy_function(start_date="2023-01-01", leaf_name="betrag_m")
100+
def betrag_ohne_staffelung_m(anzahl_ansprüche: int, satz: float) -> float:
101+
return satz * anzahl_ansprüche
102+
```
103+
104+
Key decorators:
105+
106+
- `@policy_function` - Main policy calculation functions with date ranges (`start_date`,
107+
`end_date`, `leaf_name`)
108+
- `@policy_input` - Input column definitions (no implementation body)
109+
- `@param_function` - Functions that transform raw parameters into usable forms
110+
- `@agg_by_p_id_function` - Aggregation functions by person ID (e.g., summing children's
111+
claims to parent)
112+
- `@agg_by_group_function` - Aggregation functions by group (e.g., sum to household
113+
level)
114+
- `@group_creation_function` - Functions that create group IDs (e.g., fg_id, bg_id)
115+
116+
Additional `@policy_function` parameters:
117+
118+
- `vectorization_strategy="not_required"` - For functions that operate on full columns
119+
using `xnp`
120+
- `rounding_spec=RoundingSpec(...)` - Optional rounding (GEP 5)
121+
- `fail_msg_if_included="..."` - Error message if function is included in DAG (for
122+
unimplemented periods)
123+
124+
### Automatic DAG Features (GEP 4)
125+
126+
**Auto-aggregation**: If `my_col` exists and `my_col_hh` is requested, a sum aggregation
127+
is auto-generated.
128+
129+
**Time conversion**: Automatic conversion between `_y`, `_q`, `_m`, `_w`, `_d` suffixes
130+
using these factors relative to year: 1, 4, 12, 365.25/7, 365.25.
131+
132+
### Parameters (GEP 3)
133+
134+
Policy parameters are in YAML files alongside the Python code. Each parameter has:
135+
136+
- Date-keyed values (e.g., `2023-01-01:`)
137+
- Metadata: `name` (de/en), `description` (de/en), `unit`, `reference_period`, `type`
138+
- Legal references in each date entry
139+
- Schema: `docs/geps/params-schema.json`
140+
141+
Parameter types:
142+
143+
- `scalar` - Single value (accessed via `value` key)
144+
- `dict` - Homogeneous dictionary with string/int keys
145+
- `piecewise_constant`, `piecewise_linear`, `piecewise_quadratic`, `piecewise_cubic` -
146+
For `piecewise_polynomial` function
147+
- `birth_year_based_phase_inout`, `birth_month_based_phase_inout` - Age threshold
148+
lookups by birth cohort
149+
- `require_converter` - Complex structures needing a `@param_function` converter
150+
151+
### Rounding (GEP 5)
152+
153+
```python
154+
@policy_function(
155+
rounding_spec=RoundingSpec(
156+
base=0.0001,
157+
direction="nearest", # or "up", "down"
158+
reference="§76g SGB VI Abs. 4 Nr. 4",
159+
),
160+
start_date="2021-01-01",
161+
)
162+
def höchstbetrag_m(...) -> float: ...
163+
```
164+
165+
## Naming Conventions (GEP 1, 6)
166+
167+
### Language
168+
169+
- **German** for policy-specific code (law names: Kindergeld, Bürgergeld,
170+
Einkommensteuer)
171+
- **English** for infrastructure code
172+
- **UTF-8** characters allowed (ä, ö, ü, ß)
173+
174+
### Namespaces and Qualified Names (GEP 6)
175+
176+
- Directory structure defines namespaces (e.g., `germany/kindergeld/` → namespace
177+
`kindergeld`)
178+
- Within a namespace, use local names: `betrag_m`, `satz`
179+
- Cross-namespace references use qualified names with double underscores:
180+
`arbeitslosengeld_2__einkommen_m_bg`
181+
- `betrag` is the convention for monetary amounts of a tax/transfer
182+
183+
### Column/Function Name Suffixes (GEP 1)
184+
185+
**Time units** (appear before aggregation):
186+
187+
- `_y` (year), `_q` (quarter), `_m` (month), `_w` (week), `_d` (day)
188+
189+
**Aggregation levels**:
190+
191+
- `_sn` (Steuernummer - tax unit)
192+
- `_hh` (Haushalt - household)
193+
- `_fg` (Familiengemeinschaft)
194+
- `_bg` (Bedarfsgemeinschaft)
195+
- `_eg` (Einstandsgemeinschaft)
196+
- `_ehe` (Ehegemeinschaft)
197+
198+
Example: `arbeitslosengeld_2__betrag_m_bg` = monthly ALG2 amount at Bedarfsgemeinschaft
199+
level
200+
201+
### Special Column Types (GEP 2)
202+
203+
- `p_id` - Primary person identifier (required)
204+
- `[x]_id` - Group identifiers (e.g., `hh_id`, `bg_id`) - same value for all group
205+
members
206+
- `p_id_[y]` - Person-to-person pointers (e.g., `p_id_elternteil_1`, `p_id_empfänger`).
207+
Value -1 = no link.
208+
209+
## Test Cases
210+
211+
Tests use YAML files in `tests_germany/policy_cases/{area}/{date}/`:
212+
213+
```yaml
214+
inputs:
215+
provided:
216+
alter: [35, 35, 12]
217+
p_id: [0, 1, 2]
218+
hh_id: [0, 0, 0]
219+
# Nested paths use double underscore in code, but nested dicts in YAML
220+
kindergeld:
221+
in_ausbildung: [false, false, true]
222+
p_id_empfänger: [-1, -1, 0]
223+
outputs:
224+
kindergeld:
225+
betrag_m: [250, 0, 0]
226+
```
227+
228+
## Code Restrictions for Vectorization
229+
230+
Functions must follow these rules for automatic vectorization:
231+
232+
1. **If-else blocks**: Only one operation per branch, no return inside single if (must
233+
have else)
234+
1. **Function calls**: `sum`, `any`, `all` require iterable arguments; `min`, `max` take
235+
exactly 2 args or 1 iterable
236+
1. **No elif after else**: Use nested if-else instead
237+
238+
## Useful Imports from gettsim.tt
239+
240+
```python
241+
from gettsim.tt import (
242+
# Decorators
243+
policy_function,
244+
policy_input,
245+
param_function,
246+
agg_by_group_function,
247+
agg_by_p_id_function,
248+
group_creation_function,
249+
# Types
250+
AggType, # SUM, COUNT, MEAN, MAX, MIN, ANY, ALL
251+
RoundingSpec,
252+
ConsecutiveIntLookupTableParamValue,
253+
PiecewisePolynomialParamValue,
254+
# Functions
255+
piecewise_polynomial,
256+
join, # For person-to-person lookups
257+
get_consecutive_int_lookup_table_param_value,
258+
get_piecewise_parameters,
259+
intervals_to_thresholds,
260+
merge_piecewise_intervals,
261+
PiecewisePolynomialInterval,
262+
)
263+
```
264+
265+
## Relevant GEPs
266+
267+
The [GETTSIM Enhancement Protocols](docs/geps/) define conventions:
268+
269+
- **GEP 0**: Purpose and process for GEPs
270+
- **GEP 1**: Naming conventions (identifiers, German names, time/unit suffixes)
271+
- **GEP 2**: Internal data representation (1-d arrays, group identifiers, person
272+
pointers)
273+
- **GEP 3**: Parameters of the taxes and transfers system (YAML structure, types)
274+
- **GEP 4**: DAG-based computational backend (core architecture)
275+
- **GEP 5**: Optional rounding via `RoundingSpec`
276+
- **GEP 6**: Unified architecture (namespaces, qualified names, `start_date`/`end_date`)
277+
- **GEP 7**: User interface (`main()` function, `MainTarget`, input/output handling)

0 commit comments

Comments
 (0)