Skip to content

Commit e0d468e

Browse files
authored
Improve introductory tutorial (#1138)
### What problem do you want to solve? Fixes some tutorial inconsistencies that I discovered in a quick GETTSIM tutoring session.
1 parent 25fc793 commit e0d468e

File tree

2 files changed

+44
-66
lines changed

2 files changed

+44
-66
lines changed

CHANGES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ All releases are available on [Anaconda.org](https://anaconda.org/conda-forge/ge
1515

1616
## Unreleased
1717

18+
- {gh}`1138` Improve introductory tutorial. ({ghuser}`MImmesberger`)
1819
- {gh}`1128` Add documentation for glob-style pattern matching in DAG plot colormaps,
1920
improve consistency of visualization tutorial. ({ghuser}`hmgaudecker`)
2021

docs/tutorials/simple_example.ipynb

Lines changed: 43 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -82,12 +82,17 @@
8282
"source": [
8383
"Next, we need to find out which input data we actually need to calculate the targets we\n",
8484
"are interested in. We can do this by specifying a template as the `main_target` of\n",
85-
"`gettsim.main`.\n",
85+
"`gettsim.main`. The template returns the input variables needed to compute the specified\n",
86+
"`tt_targets`.\n",
8687
"\n",
87-
"Because we are interested in social insurance contributions paid when being in regular\n",
88-
"employment, we are not interested in retirees or in households which depend on social\n",
89-
"assistance. We can override these transfers when requesting the template. This removes\n",
90-
"the input data needed to compute these transfers from the template."
88+
"Some of these inputs are computed from other inputs. If you already know the value of\n",
89+
"such a computed input, you can provide it directly in the template call. GETTSIM will\n",
90+
"then exclude its upstream dependencies from the template, giving you a shorter list of\n",
91+
"remaining inputs to fill. For example, the old-age pension benefit\n",
92+
"(`sozialversicherung__rente__altersrente__betrag_m`) depends on many pension-related\n",
93+
"inputs (entitlement points, contribution months, etc.). Since nobody in our scenario is\n",
94+
"retired, we provide it as 0, which removes all of those upstream inputs from the\n",
95+
"template."
9196
]
9297
},
9398
{
@@ -107,12 +112,7 @@
107112
" \"rente\": {\n",
108113
" \"altersrente\": {\"betrag_m\": pd.Series([0])},\n",
109114
" },\n",
110-
" \"arbeitslosen\": {\"betrag_m\": pd.Series([0])},\n",
111115
" },\n",
112-
" \"wohngeld\": {\"betrag_m_wthh\": pd.Series([0])},\n",
113-
" \"kinderzuschlag\": {\"betrag_m_bg\": pd.Series([0])},\n",
114-
" \"elterngeld\": {\"betrag_m\": pd.Series([0])},\n",
115-
" \"arbeitslosengeld_2\": {\"betrag_m_bg\": pd.Series([0])},\n",
116116
" }\n",
117117
" ),\n",
118118
" include_warn_nodes=False,\n",
@@ -123,7 +123,18 @@
123123
"cell_type": "markdown",
124124
"metadata": {},
125125
"source": [
126-
"Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects."
126+
"The output above is a nested dictionary whose leaves are dtype hints. Each leaf\n",
127+
"corresponds to an input variable that GETTSIM needs. To build the mapper (below), we\n",
128+
"replace each dtype hint with a column name from our input DataFrame."
129+
]
130+
},
131+
{
132+
"cell_type": "markdown",
133+
"metadata": {},
134+
"source": [
135+
"Now, we create some example data. Our example household consists of a married couple\n",
136+
"(both 30 years old, both employed) with a 10-year-old child. Here, we use a pandas\n",
137+
"DataFrame with column names that are different from the ones GETTSIM expects."
127138
]
128139
},
129140
{
@@ -151,7 +162,6 @@
151162
" \"childcare_expenses\": [0, 0, 0],\n",
152163
" \"person_that_pays_childcare_expenses\": [-1, -1, 0],\n",
153164
" \"joint_taxation\": [True, True, False],\n",
154-
" \"amount_private_pension_income\": [0, 0, 0],\n",
155165
" \"contribution_private_health_insurance\": [0, 0, 0],\n",
156166
" \"has_children\": [True, True, False],\n",
157167
" \"single_parent\": [False, False, False],\n",
@@ -169,12 +179,8 @@
169179
"cell_type": "markdown",
170180
"metadata": {},
171181
"source": [
172-
"Next, we define a mapping from GETTSIM's expected input structure to your data. Note\n",
173-
"that the paths are the union of the input_data for `main` and the result from calling it\n",
174-
"above (with `main_target=MainTarget.templates.input_data_dtypes.tree`).\n",
175-
"\n",
176-
"Just the leaves are different; we have replaced the dtype hints by the column names in\n",
177-
"the data."
182+
"Next, we define a mapping from GETTSIM's expected input structure to our data. At each\n",
183+
"leaf, we either put a column name from `DATA` or a constant value."
178184
]
179185
},
180186
{
@@ -226,7 +232,6 @@
226232
" \"gemeinsam_veranlagt\": \"joint_taxation\",\n",
227233
" },\n",
228234
" \"sozialversicherung\": {\n",
229-
" \"arbeitslosen\": {\"betrag_m\": 0.0},\n",
230235
" \"rente\": {\n",
231236
" \"jahr_renteneintritt\": 2080,\n",
232237
" \"altersrente\": {\n",
@@ -248,18 +253,6 @@
248253
" \"p_id_elternteil_1\": \"parent_id_1\",\n",
249254
" \"p_id_elternteil_2\": \"parent_id_2\",\n",
250255
" },\n",
251-
" \"wohngeld\": {\n",
252-
" \"betrag_m_wthh\": 0.0,\n",
253-
" },\n",
254-
" \"kinderzuschlag\": {\n",
255-
" \"betrag_m_bg\": 0.0,\n",
256-
" },\n",
257-
" \"elterngeld\": {\n",
258-
" \"betrag_m\": 0.0,\n",
259-
" },\n",
260-
" \"arbeitslosengeld_2\": {\n",
261-
" \"betrag_m_bg\": 0.0,\n",
262-
" },\n",
263256
" \"kindergeld\": {\n",
264257
" \"in_ausbildung\": \"in_training\",\n",
265258
" \"p_id_empfänger\": \"id_recipient_child_allowance\",\n",
@@ -271,39 +264,28 @@
271264
"cell_type": "markdown",
272265
"metadata": {},
273266
"source": [
274-
"In practice, you would probably want to save the template above to disk (e.g. as a yaml\n",
275-
"file) and edit it there. Then you can read in the file and use its content as the\n",
276-
"mapper.\n",
277-
"\n",
278-
"Note: When writing the template to your disk, don't forget to allow for unicode\n",
279-
"characters. This is important because many transfers have Umlaute in their names. \n",
280-
"\n",
281-
"An example could look like this:\n",
267+
"In practice, you would probably save the template to disk (e.g. as a YAML file), edit\n",
268+
"the leaves there, and read it back in as the mapper. Remember to allow for unicode\n",
269+
"characters, since many variable names contain Umlaute.\n",
282270
"\n",
283271
"```python\n",
284272
"import yaml\n",
285273
"\n",
286-
"# Write the template to your disk...\n",
287274
"with PATH_FOR_TEMPLATE.open(\"w\") as f:\n",
288275
" yaml.dump(TEMPLATE, f, allow_unicode=True)\n",
289276
"\n",
290-
"# Edit the leafs in the template and then read it back in\n",
277+
"# Edit the leaves in the template, then read it back in\n",
291278
"with PATH_FOR_TEMPLATE.open(\"r\") as f:\n",
292279
" MAPPER = yaml.safe_load(f)\n",
293280
"```\n",
294281
"\n",
295-
"Some inputs in the mapper may not be directly relevant to an individual’s current\n",
296-
"situation, as they depend on future circumstances or choices. For example, variables\n",
297-
"such as `jahr_renteneintritt` or `alter_beginn_leistungsbezug_sonstige_private_vorsorge`\n",
298-
"are only meaningful for individuals who actually receive pensions. However, since\n",
299-
"GETTSIM represents the tax and transfer system as a static directed acyclic graph (DAG),\n",
300-
"these inputs are still required, even when they are not applicable. \n",
301-
"\n",
302-
"In such cases, you can safely assign a reasonable default value. The exact number\n",
303-
"usually does not matter (as long as the corresponding benefit, such as the pension, is\n",
304-
"set to zero), but it must still be a valid input. For instance, `jahr_renteneintritt`\n",
305-
"should not be set to 0 or `None`, but any plausible year value will work. \n",
306-
"\n"
282+
"Some inputs may not be directly relevant to the scenario at hand. For example,\n",
283+
"`jahr_renteneintritt` and `alter_beginn_leistungsbezug_sonstige_private_vorsorge` only\n",
284+
"matter for people who actually receive pensions. Because GETTSIM's DAG is static, these\n",
285+
"inputs are still required even when the corresponding benefit is zero. In such cases,\n",
286+
"assign a reasonable default value — the exact number does not matter (as long as the\n",
287+
"resulting benefit itself is zero), but it must be a valid input (e.g. a plausible year,\n",
288+
"not 0 or `None`)."
307289
]
308290
},
309291
{
@@ -312,20 +294,15 @@
312294
"source": [
313295
"## Calculating taxes and transfers\n",
314296
"\n",
315-
"Just as for taxes and transfers, GETTSIM's `main` function is powered by a DAG. This\n",
316-
"comes with the advantages that seasoned GETTSIM users already know from the DAG\n",
317-
"representing the taxes and transfers system:\n",
318-
"- Users can select any part of the DAG as a target. This means that users can access\n",
319-
" any intermediate objects.\n",
320-
"- Users can feed any part of the DAG as input. This means that users can overwrite\n",
321-
" specific parts of the DAG (e.g. the policy environment).\n",
322-
"- Users can decide which parts of the DAG not to compute. For example, users can choose\n",
323-
" not to perform safety checks on the input data. This means that GETTSIM is quicker in\n",
324-
" computing the result (at the expense of informative errors).\n",
297+
"GETTSIM's `main` function is powered by a DAG. This has several advantages:\n",
298+
"- You can select any part of the DAG as a target, giving access to intermediate results.\n",
299+
"- You can feed any part of the DAG as input, overwriting specific parts (e.g. the\n",
300+
" policy environment).\n",
301+
"- You can skip parts of the DAG (e.g. safety checks on input data) to speed up\n",
302+
" computation, at the expense of less informative error messages.\n",
325303
"\n",
326-
"First, we look at the one-stop shop: computing the targets defined above using the input\n",
327-
"data. In a second example, we manipulate the policy environment to see why the interface\n",
328-
"DAG is useful.\n",
304+
"First, we compute the targets defined above using the input data. In a second example,\n",
305+
"we manipulate the policy environment to see why the interface DAG is useful.\n",
329306
"\n",
330307
"### Simple computation\n",
331308
"\n",

0 commit comments

Comments
 (0)