add figure

florian-huber · florian-huber · commit c00faedf2406 · 2025-06-06T11:45:42.000+02:00
diff --git a/notebooks/22_NLP_2_tokenization.ipynb b/notebooks/22_NLP_2_tokenization.ipynb
@@ -99,7 +99,14 @@
    "source": [
     "## NLP Preprocessing Workflow\n",
     "\n",
-    "We usually work with text in various formats and sizes, for instance, from `.txt`, `.html`, or other structured or unstructured text file formats. \n",
+    "We usually work with text in various formats and sizes, for instance, from `.txt`, `.html`, or other structured or unstructured text file formats. For a later systematic data analysis or the training of machine-learning models, we first have to preprocess the text data consistently, typically done as sketched in {numref}`fig_nlp_processing_workflow`.\n",
+    "\n",
+    "```{figure} ../images/fig_nlp_processing_workflow.png\n",
+    ":name: fig_nlp_processing_workflow\n",
+    "\n",
+    "Typically, an NLP preprocessing workflow consists of several stages, including raw text cleaning, tokenization, token cleaning, and token normalization. This is often the basis for later analysis or modeling steps.\n",
+    "```\n",
+    "\n",
     "\n",
     "### Raw Text Cleaning\n",
     "\n",
@@ -900,7 +907,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.18"
+   "version": "3.12.9"
   }
  },
  "nbformat": 4,