Improve introductory tutorial (#1138)

MImmesberger · web-flow · commit e0d468ebafc9 · 2026-03-11T21:52:33.000+01:00
### What problem do you want to solve?

Fixes some tutorial inconsistencies that I discovered in a quick GETTSIM
tutoring session.
diff --git a/CHANGES.md b/CHANGES.md
@@ -15,6 +15,7 @@ All releases are available on [Anaconda.org](https://anaconda.org/conda-forge/ge
 
 ## Unreleased
 
+- {gh}`1138` Improve introductory tutorial. ({ghuser}`MImmesberger`)
 - {gh}`1128` Add documentation for glob-style pattern matching in DAG plot colormaps,
   improve consistency of visualization tutorial. ({ghuser}`hmgaudecker`)
 
diff --git a/docs/tutorials/simple_example.ipynb b/docs/tutorials/simple_example.ipynb
@@ -82,12 +82,17 @@
    "source": [
     "Next, we need to find out which input data we actually need to calculate the targets we\n",
     "are interested in. We can do this by specifying a template as the `main_target` of\n",
-    "`gettsim.main`.\n",
+    "`gettsim.main`. The template returns the input variables needed to compute the specified\n",
+    "`tt_targets`.\n",
     "\n",
-    "Because we are interested in social insurance contributions paid when being in regular\n",
-    "employment, we are not interested in retirees or in households which depend on social\n",
-    "assistance. We can override these transfers when requesting the template. This removes\n",
-    "the input data needed to compute these transfers from the template."
+    "Some of these inputs are computed from other inputs. If you already know the value of\n",
+    "such a computed input, you can provide it directly in the template call. GETTSIM will\n",
+    "then exclude its upstream dependencies from the template, giving you a shorter list of\n",
+    "remaining inputs to fill. For example, the old-age pension benefit\n",
+    "(`sozialversicherung__rente__altersrente__betrag_m`) depends on many pension-related\n",
+    "inputs (entitlement points, contribution months, etc.). Since nobody in our scenario is\n",
+    "retired, we provide it as 0, which removes all of those upstream inputs from the\n",
+    "template."
    ]
   },
   {
@@ -107,12 +112,7 @@
     "                \"rente\": {\n",
     "                    \"altersrente\": {\"betrag_m\": pd.Series([0])},\n",
     "                },\n",
-    "                \"arbeitslosen\": {\"betrag_m\": pd.Series([0])},\n",
     "            },\n",
-    "            \"wohngeld\": {\"betrag_m_wthh\": pd.Series([0])},\n",
-    "            \"kinderzuschlag\": {\"betrag_m_bg\": pd.Series([0])},\n",
-    "            \"elterngeld\": {\"betrag_m\": pd.Series([0])},\n",
-    "            \"arbeitslosengeld_2\": {\"betrag_m_bg\": pd.Series([0])},\n",
     "        }\n",
     "    ),\n",
     "    include_warn_nodes=False,\n",
@@ -123,7 +123,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects."
+    "The output above is a nested dictionary whose leaves are dtype hints. Each leaf\n",
+    "corresponds to an input variable that GETTSIM needs. To build the mapper (below), we\n",
+    "replace each dtype hint with a column name from our input DataFrame."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, we create some example data. Our example household consists of a married couple\n",
+    "(both 30 years old, both employed) with a 10-year-old child. Here, we use a pandas\n",
+    "DataFrame with column names that are different from the ones GETTSIM expects."
    ]
   },
   {
@@ -151,7 +162,6 @@
     "        \"childcare_expenses\": [0, 0, 0],\n",
     "        \"person_that_pays_childcare_expenses\": [-1, -1, 0],\n",
     "        \"joint_taxation\": [True, True, False],\n",
-    "        \"amount_private_pension_income\": [0, 0, 0],\n",
     "        \"contribution_private_health_insurance\": [0, 0, 0],\n",
     "        \"has_children\": [True, True, False],\n",
     "        \"single_parent\": [False, False, False],\n",
@@ -169,12 +179,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next, we define a mapping from GETTSIM's expected input structure to your data. Note\n",
-    "that the paths are the union of the input_data for `main` and the result from calling it\n",
-    "above (with `main_target=MainTarget.templates.input_data_dtypes.tree`).\n",
-    "\n",
-    "Just the leaves are different; we have replaced the dtype hints by the column names in\n",
-    "the data."
+    "Next, we define a mapping from GETTSIM's expected input structure to our data. At each\n",
+    "leaf, we either put a column name from `DATA` or a constant value."
    ]
   },
   {
@@ -226,7 +232,6 @@
     "        \"gemeinsam_veranlagt\": \"joint_taxation\",\n",
     "    },\n",
     "    \"sozialversicherung\": {\n",
-    "        \"arbeitslosen\": {\"betrag_m\": 0.0},\n",
     "        \"rente\": {\n",
     "            \"jahr_renteneintritt\": 2080,\n",
     "            \"altersrente\": {\n",
@@ -248,18 +253,6 @@
     "        \"p_id_elternteil_1\": \"parent_id_1\",\n",
     "        \"p_id_elternteil_2\": \"parent_id_2\",\n",
     "    },\n",
-    "    \"wohngeld\": {\n",
-    "        \"betrag_m_wthh\": 0.0,\n",
-    "    },\n",
-    "    \"kinderzuschlag\": {\n",
-    "        \"betrag_m_bg\": 0.0,\n",
-    "    },\n",
-    "    \"elterngeld\": {\n",
-    "        \"betrag_m\": 0.0,\n",
-    "    },\n",
-    "    \"arbeitslosengeld_2\": {\n",
-    "        \"betrag_m_bg\": 0.0,\n",
-    "    },\n",
     "    \"kindergeld\": {\n",
     "        \"in_ausbildung\": \"in_training\",\n",
     "        \"p_id_empfänger\": \"id_recipient_child_allowance\",\n",
@@ -271,39 +264,28 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In practice, you would probably want to save the template above to disk (e.g. as a yaml\n",
-    "file) and edit it there. Then you can read in the file and use its content as the\n",
-    "mapper.\n",
-    "\n",
-    "Note: When writing the template to your disk, don't forget to allow for unicode\n",
-    "characters. This is important because many transfers have Umlaute in their names. \n",
-    "\n",
-    "An example could look like this:\n",
+    "In practice, you would probably save the template to disk (e.g. as a YAML file), edit\n",
+    "the leaves there, and read it back in as the mapper. Remember to allow for unicode\n",
+    "characters, since many variable names contain Umlaute.\n",
     "\n",
     "```python\n",
     "import yaml\n",
     "\n",
-    "# Write the template to your disk...\n",
     "with PATH_FOR_TEMPLATE.open(\"w\") as f:\n",
     "    yaml.dump(TEMPLATE, f, allow_unicode=True)\n",
     "\n",
-    "# Edit the leafs in the template and then read it back in\n",
+    "# Edit the leaves in the template, then read it back in\n",
     "with PATH_FOR_TEMPLATE.open(\"r\") as f:\n",
     "    MAPPER = yaml.safe_load(f)\n",
     "```\n",
     "\n",
-    "Some inputs in the mapper may not be directly relevant to an individual’s current\n",
-    "situation, as they depend on future circumstances or choices. For example, variables\n",
-    "such as `jahr_renteneintritt` or `alter_beginn_leistungsbezug_sonstige_private_vorsorge`\n",
-    "are only meaningful for individuals who actually receive pensions. However, since\n",
-    "GETTSIM represents the tax and transfer system as a static directed acyclic graph (DAG),\n",
-    "these inputs are still required, even when they are not applicable.  \n",
-    "\n",
-    "In such cases, you can safely assign a reasonable default value. The exact number\n",
-    "usually does not matter (as long as the corresponding benefit, such as the pension, is\n",
-    "set to zero), but it must still be a valid input. For instance, `jahr_renteneintritt`\n",
-    "should not be set to 0 or `None`, but any plausible year value will work.  \n",
-    "\n"
+    "Some inputs may not be directly relevant to the scenario at hand. For example,\n",
+    "`jahr_renteneintritt` and `alter_beginn_leistungsbezug_sonstige_private_vorsorge` only\n",
+    "matter for people who actually receive pensions. Because GETTSIM's DAG is static, these\n",
+    "inputs are still required even when the corresponding benefit is zero. In such cases,\n",
+    "assign a reasonable default value — the exact number does not matter (as long as the\n",
+    "resulting benefit itself is zero), but it must be a valid input (e.g. a plausible year,\n",
+    "not 0 or `None`)."
    ]
   },
   {
@@ -312,20 +294,15 @@
    "source": [
     "## Calculating taxes and transfers\n",
     "\n",
-    "Just as for taxes and transfers, GETTSIM's `main` function is powered by a DAG. This\n",
-    "comes with the advantages that seasoned GETTSIM users already know from the DAG\n",
-    "representing the taxes and transfers system:\n",
-    "- Users can select any part of the DAG as a target. This means that users can access\n",
-    "  any intermediate objects.\n",
-    "- Users can feed any part of the DAG as input. This means that users can overwrite\n",
-    "  specific parts of the DAG (e.g. the policy environment).\n",
-    "- Users can decide which parts of the DAG not to compute. For example, users can choose\n",
-    "  not to perform safety checks on the input data. This means that GETTSIM is quicker in\n",
-    "  computing the result (at the expense of informative errors).\n",
+    "GETTSIM's `main` function is powered by a DAG. This has several advantages:\n",
+    "- You can select any part of the DAG as a target, giving access to intermediate results.\n",
+    "- You can feed any part of the DAG as input, overwriting specific parts (e.g. the\n",
+    "  policy environment).\n",
+    "- You can skip parts of the DAG (e.g. safety checks on input data) to speed up\n",
+    "  computation, at the expense of less informative error messages.\n",
     "\n",
-    "First, we look at the one-stop shop: computing the targets defined above using the input\n",
-    "data. In a second example, we manipulate the policy environment to see why the interface\n",
-    "DAG is useful.\n",
+    "First, we compute the targets defined above using the input data. In a second example,\n",
+    "we manipulate the policy environment to see why the interface DAG is useful.\n",
     "\n",
     "### Simple computation\n",
     "\n",