|
82 | 82 | "source": [ |
83 | 83 | "Next, we need to find out which input data we actually need to calculate the targets we\n", |
84 | 84 | "are interested in. We can do this by specifying a template as the `main_target` of\n", |
85 | | - "`gettsim.main`.\n", |
| 85 | + "`gettsim.main`. The template returns the input variables needed to compute the specified\n", |
| 86 | + "`tt_targets`.\n", |
86 | 87 | "\n", |
87 | | - "Because we are interested in social insurance contributions paid when being in regular\n", |
88 | | - "employment, we are not interested in retirees or in households which depend on social\n", |
89 | | - "assistance. We can override these transfers when requesting the template. This removes\n", |
90 | | - "the input data needed to compute these transfers from the template." |
| 88 | + "Some of these inputs are computed from other inputs. If you already know the value of\n", |
| 89 | + "such a computed input, you can provide it directly in the template call. GETTSIM will\n", |
| 90 | + "then exclude its upstream dependencies from the template, giving you a shorter list of\n", |
| 91 | + "remaining inputs to fill. For example, the old-age pension benefit\n", |
| 92 | + "(`sozialversicherung__rente__altersrente__betrag_m`) depends on many pension-related\n", |
| 93 | + "inputs (entitlement points, contribution months, etc.). Since nobody in our scenario is\n", |
| 94 | + "retired, we provide it as 0, which removes all of those upstream inputs from the\n", |
| 95 | + "template." |
91 | 96 | ] |
92 | 97 | }, |
93 | 98 | { |
|
107 | 112 | " \"rente\": {\n", |
108 | 113 | " \"altersrente\": {\"betrag_m\": pd.Series([0])},\n", |
109 | 114 | " },\n", |
110 | | - " \"arbeitslosen\": {\"betrag_m\": pd.Series([0])},\n", |
111 | 115 | " },\n", |
112 | | - " \"wohngeld\": {\"betrag_m_wthh\": pd.Series([0])},\n", |
113 | | - " \"kinderzuschlag\": {\"betrag_m_bg\": pd.Series([0])},\n", |
114 | | - " \"elterngeld\": {\"betrag_m\": pd.Series([0])},\n", |
115 | | - " \"arbeitslosengeld_2\": {\"betrag_m_bg\": pd.Series([0])},\n", |
116 | 116 | " }\n", |
117 | 117 | " ),\n", |
118 | 118 | " include_warn_nodes=False,\n", |
|
123 | 123 | "cell_type": "markdown", |
124 | 124 | "metadata": {}, |
125 | 125 | "source": [ |
126 | | - "Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects." |
| 126 | + "The output above is a nested dictionary whose leaves are dtype hints. Each leaf\n", |
| 127 | + "corresponds to an input variable that GETTSIM needs. To build the mapper (below), we\n", |
| 128 | + "replace each dtype hint with a column name from our input DataFrame." |
| 129 | + ] |
| 130 | + }, |
| 131 | + { |
| 132 | + "cell_type": "markdown", |
| 133 | + "metadata": {}, |
| 134 | + "source": [ |
| 135 | + "Now, we create some example data. Our example household consists of a married couple\n", |
| 136 | + "(both 30 years old, both employed) with a 10-year-old child. Here, we use a pandas\n", |
| 137 | + "DataFrame with column names that are different from the ones GETTSIM expects." |
127 | 138 | ] |
128 | 139 | }, |
129 | 140 | { |
|
151 | 162 | " \"childcare_expenses\": [0, 0, 0],\n", |
152 | 163 | " \"person_that_pays_childcare_expenses\": [-1, -1, 0],\n", |
153 | 164 | " \"joint_taxation\": [True, True, False],\n", |
154 | | - " \"amount_private_pension_income\": [0, 0, 0],\n", |
155 | 165 | " \"contribution_private_health_insurance\": [0, 0, 0],\n", |
156 | 166 | " \"has_children\": [True, True, False],\n", |
157 | 167 | " \"single_parent\": [False, False, False],\n", |
|
169 | 179 | "cell_type": "markdown", |
170 | 180 | "metadata": {}, |
171 | 181 | "source": [ |
172 | | - "Next, we define a mapping from GETTSIM's expected input structure to your data. Note\n", |
173 | | - "that the paths are the union of the input_data for `main` and the result from calling it\n", |
174 | | - "above (with `main_target=MainTarget.templates.input_data_dtypes.tree`).\n", |
175 | | - "\n", |
176 | | - "Just the leaves are different; we have replaced the dtype hints by the column names in\n", |
177 | | - "the data." |
| 182 | + "Next, we define a mapping from GETTSIM's expected input structure to our data. At each\n", |
| 183 | + "leaf, we either put a column name from `DATA` or a constant value." |
178 | 184 | ] |
179 | 185 | }, |
180 | 186 | { |
|
226 | 232 | " \"gemeinsam_veranlagt\": \"joint_taxation\",\n", |
227 | 233 | " },\n", |
228 | 234 | " \"sozialversicherung\": {\n", |
229 | | - " \"arbeitslosen\": {\"betrag_m\": 0.0},\n", |
230 | 235 | " \"rente\": {\n", |
231 | 236 | " \"jahr_renteneintritt\": 2080,\n", |
232 | 237 | " \"altersrente\": {\n", |
|
248 | 253 | " \"p_id_elternteil_1\": \"parent_id_1\",\n", |
249 | 254 | " \"p_id_elternteil_2\": \"parent_id_2\",\n", |
250 | 255 | " },\n", |
251 | | - " \"wohngeld\": {\n", |
252 | | - " \"betrag_m_wthh\": 0.0,\n", |
253 | | - " },\n", |
254 | | - " \"kinderzuschlag\": {\n", |
255 | | - " \"betrag_m_bg\": 0.0,\n", |
256 | | - " },\n", |
257 | | - " \"elterngeld\": {\n", |
258 | | - " \"betrag_m\": 0.0,\n", |
259 | | - " },\n", |
260 | | - " \"arbeitslosengeld_2\": {\n", |
261 | | - " \"betrag_m_bg\": 0.0,\n", |
262 | | - " },\n", |
263 | 256 | " \"kindergeld\": {\n", |
264 | 257 | " \"in_ausbildung\": \"in_training\",\n", |
265 | 258 | " \"p_id_empfänger\": \"id_recipient_child_allowance\",\n", |
|
271 | 264 | "cell_type": "markdown", |
272 | 265 | "metadata": {}, |
273 | 266 | "source": [ |
274 | | - "In practice, you would probably want to save the template above to disk (e.g. as a yaml\n", |
275 | | - "file) and edit it there. Then you can read in the file and use its content as the\n", |
276 | | - "mapper.\n", |
277 | | - "\n", |
278 | | - "Note: When writing the template to your disk, don't forget to allow for unicode\n", |
279 | | - "characters. This is important because many transfers have Umlaute in their names. \n", |
280 | | - "\n", |
281 | | - "An example could look like this:\n", |
| 267 | + "In practice, you would probably save the template to disk (e.g. as a YAML file), edit\n", |
| 268 | + "the leaves there, and read it back in as the mapper. Remember to allow for unicode\n", |
| 269 | + "characters, since many variable names contain Umlaute.\n", |
282 | 270 | "\n", |
283 | 271 | "```python\n", |
284 | 272 | "import yaml\n", |
285 | 273 | "\n", |
286 | | - "# Write the template to your disk...\n", |
287 | 274 | "with PATH_FOR_TEMPLATE.open(\"w\") as f:\n", |
288 | 275 | " yaml.dump(TEMPLATE, f, allow_unicode=True)\n", |
289 | 276 | "\n", |
290 | | - "# Edit the leafs in the template and then read it back in\n", |
| 277 | + "# Edit the leaves in the template, then read it back in\n", |
291 | 278 | "with PATH_FOR_TEMPLATE.open(\"r\") as f:\n", |
292 | 279 | " MAPPER = yaml.safe_load(f)\n", |
293 | 280 | "```\n", |
294 | 281 | "\n", |
295 | | - "Some inputs in the mapper may not be directly relevant to an individual’s current\n", |
296 | | - "situation, as they depend on future circumstances or choices. For example, variables\n", |
297 | | - "such as `jahr_renteneintritt` or `alter_beginn_leistungsbezug_sonstige_private_vorsorge`\n", |
298 | | - "are only meaningful for individuals who actually receive pensions. However, since\n", |
299 | | - "GETTSIM represents the tax and transfer system as a static directed acyclic graph (DAG),\n", |
300 | | - "these inputs are still required, even when they are not applicable. \n", |
301 | | - "\n", |
302 | | - "In such cases, you can safely assign a reasonable default value. The exact number\n", |
303 | | - "usually does not matter (as long as the corresponding benefit, such as the pension, is\n", |
304 | | - "set to zero), but it must still be a valid input. For instance, `jahr_renteneintritt`\n", |
305 | | - "should not be set to 0 or `None`, but any plausible year value will work. \n", |
306 | | - "\n" |
| 282 | + "Some inputs may not be directly relevant to the scenario at hand. For example,\n", |
| 283 | + "`jahr_renteneintritt` and `alter_beginn_leistungsbezug_sonstige_private_vorsorge` only\n", |
| 284 | + "matter for people who actually receive pensions. Because GETTSIM's DAG is static, these\n", |
| 285 | + "inputs are still required even when the corresponding benefit is zero. In such cases,\n", |
| 286 | + "assign a reasonable default value — the exact number does not matter (as long as the\n", |
| 287 | + "resulting benefit itself is zero), but it must be a valid input (e.g. a plausible year,\n", |
| 288 | + "not 0 or `None`)." |
307 | 289 | ] |
308 | 290 | }, |
309 | 291 | { |
|
312 | 294 | "source": [ |
313 | 295 | "## Calculating taxes and transfers\n", |
314 | 296 | "\n", |
315 | | - "Just as for taxes and transfers, GETTSIM's `main` function is powered by a DAG. This\n", |
316 | | - "comes with the advantages that seasoned GETTSIM users already know from the DAG\n", |
317 | | - "representing the taxes and transfers system:\n", |
318 | | - "- Users can select any part of the DAG as a target. This means that users can access\n", |
319 | | - " any intermediate objects.\n", |
320 | | - "- Users can feed any part of the DAG as input. This means that users can overwrite\n", |
321 | | - " specific parts of the DAG (e.g. the policy environment).\n", |
322 | | - "- Users can decide which parts of the DAG not to compute. For example, users can choose\n", |
323 | | - " not to perform safety checks on the input data. This means that GETTSIM is quicker in\n", |
324 | | - " computing the result (at the expense of informative errors).\n", |
| 297 | + "GETTSIM's `main` function is powered by a DAG. This has several advantages:\n", |
| 298 | + "- You can select any part of the DAG as a target, giving access to intermediate results.\n", |
| 299 | + "- You can feed any part of the DAG as input, overwriting specific parts (e.g. the\n", |
| 300 | + " policy environment).\n", |
| 301 | + "- You can skip parts of the DAG (e.g. safety checks on input data) to speed up\n", |
| 302 | + " computation, at the expense of less informative error messages.\n", |
325 | 303 | "\n", |
326 | | - "First, we look at the one-stop shop: computing the targets defined above using the input\n", |
327 | | - "data. In a second example, we manipulate the policy environment to see why the interface\n", |
328 | | - "DAG is useful.\n", |
| 304 | + "First, we compute the targets defined above using the input data. In a second example,\n", |
| 305 | + "we manipulate the policy environment to see why the interface DAG is useful.\n", |
329 | 306 | "\n", |
330 | 307 | "### Simple computation\n", |
331 | 308 | "\n", |
|
0 commit comments