biosustain · sambra95 · Mar 1, 2026 · Feb 28, 2026 · Feb 28, 2026 · Feb 28, 2026
diff --git a/docs/tutorial/preprocessing.ipynb b/docs/tutorial/preprocessing.ipynb
@@ -3,74 +3,214 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "# Preprocess growth data\n\nThis tutorial demonstrates the preprocessing functions in `growthcurves.preprocessing`:\n\n- **`path_correct(N, path_length_cm)`**\n- **`blank_subtraction(N, blank)`**\n- **`out_of_iqr_window(values, factor, position)`**\n- **`out_of_iqr(N, window_size, factor)`**\n\nUse this workflow before model fitting when measurements require optical corrections or outlier screening."
+   "source": [
+    "# Preprocess growth data\n",
+    "\n",
+    "This tutorial demonstrates the preprocessing functions in\n",
+    "`growthcurves.preprocessing`:\n",
+    "\n",
+    "- **`path_correct(N, path_length_cm)`**\n",
+    "- **`blank_subtraction(N, blank)`**\n",
+    "- **`out_of_iqr_window(values, factor, position)`** — single-window helper\n",
+    "- **`detect_outliers(N, method, **kwargs)`** — main outlier detection entry point\n",
+    "  - `method=\"iqr\"` — sliding-window IQR (kwargs: `window_size`, `factor`)\n",
+    "  - `method=\"ecod\"` — ECOD anomaly detection (kwargs: `factor`)\n",
+    "\n",
+    "Use this workflow before model fitting when measurements require optical corrections\n",
+    "or outlier screening."
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "import numpy as np\n\nimport growthcurves as gc\nfrom growthcurves import preprocessing as prep"
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "import growthcurves as gc\n",
+    "from growthcurves import preprocessing as prep"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Path length correction"
+   "source": [
+    "## Path length correction"
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "# Measurements taken at 0.5 cm path length\nraw_od = np.array([0.25, 0.30, 0.35, 0.40])\nod_1cm = gc.path_correct(raw_od, path_length_cm=0.5)\n\nprint(f'Raw OD (0.5 cm): {raw_od}')\nprint(f'Corrected OD (1.0 cm): {od_1cm}')"
+   "source": [
+    "# Measurements taken at 0.5 cm path length\n",
+    "raw_od = np.array([0.25, 0.30, 0.35, 0.40])\n",
+    "od_1cm = gc.path_correct(raw_od, path_length_cm=0.5)\n",
+    "\n",
+    "print(f\"Raw OD (0.5 cm): {raw_od}\")\n",
+    "print(f\"Corrected OD (1.0 cm): {od_1cm}\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Blank subtraction"
+   "source": [
+    "## Blank subtraction"
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "sample_od = np.array([0.50, 0.60, 0.70, 0.80])\nblank_od = np.array([0.05, 0.052, 0.048, 0.051])\ncorrected_od = gc.blank_subtraction(sample_od, blank_od)\n\nprint(f'Sample OD:   {sample_od}')\nprint(f'Blank OD:    {blank_od}')\nprint(f'Corrected OD:{corrected_od}')"
+   "source": [
+    "sample_od = np.array([0.50, 0.60, 0.70, 0.80])\n",
+    "blank_od = np.array([0.05, 0.052, 0.048, 0.051])\n",
+    "corrected_od = gc.blank_subtraction(sample_od, blank_od)\n",
+    "\n",
+    "print(f\"Sample OD:   {sample_od}\")\n",
+    "print(f\"Blank OD:    {blank_od}\")\n",
+    "print(f\"Corrected OD:{corrected_od}\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Outlier detection in a single window"
+   "source": [
+    "## Outlier detection in a single window"
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "window = np.array([0.10, 0.12, 0.65, 0.11, 0.13])\ncenter_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position='center')\nfirst_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position='first')\nlast_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position='last')\n\nprint(f'Window: {window}')\nprint(f'Center value outlier? {center_is_outlier}')\nprint(f'First value outlier?  {first_is_outlier}')\nprint(f'Last value outlier?   {last_is_outlier}')"
+   "source": [
+    "window = np.array([0.10, 0.12, 0.65, 0.11, 0.13])\n",
+    "center_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position=\"center\")\n",
+    "first_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position=\"first\")\n",
+    "last_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position=\"last\")\n",
+    "\n",
+    "print(f\"Window: {window}\")\n",
+    "print(f\"Center value outlier? {center_is_outlier}\")\n",
+    "print(f\"First value outlier?  {first_is_outlier}\")\n",
+    "print(f\"Last value outlier?   {last_is_outlier}\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Outlier detection across a full time series"
+   "source": [
+    "## Outlier detection across a full time series with `detect_outliers`\n",
+    "\n",
+    "`detect_outliers(N, method=..., **kwargs)` is the main entry point. Pass\n",
+    "`method=\"iqr\"` for the sliding-window IQR approach:\n",
+    "\n",
+    "- For values in the centre of a window the IQR status is calculated for that window.\n",
+    "- For the first and last values (which cannot be centred in a window) the IQR status\n",
+    "  is calculated using the first and last positions of their respective windows.\n",
+    "  This is especially useful for catching outliers at the start of a series.\n",
+    "\n",
+    "Example with a centre outlier:"
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "od_series = np.array([0.08, 0.11, 0.14, 0.19, 0.23, 0.95, 0.31, 0.36, 0.41])\nmask = prep.out_of_iqr(od_series, window_size=5, factor=1.5)\n\nprint(f'OD series: {od_series}')\nprint(f'Outlier mask: {mask}')\nprint(f'Outlier indices: {np.where(mask)[0]}')\nprint(f'Outlier values: {od_series[mask]}')"
+   "source": [
+    "od_series = np.array([0.08, 0.11, 0.14, 0.19, 0.23, 0.25, 0.95, 0.31, 0.36, 0.41])\n",
+    "mask = prep.detect_outliers(od_series, method=\"iqr\", window_size=5, factor=1.5)\n",
+    "\n",
+    "print(f\"OD series: {od_series}\")\n",
+    "print(f\"Outlier mask: {mask}\")\n",
+    "print(f\"Outlier indices: {np.where(mask)[0]}\")\n",
+    "print(f\"Outlier values: {od_series[mask]}\")"
+   ]
   },
   {
    "cell_type": "markdown",
+   "id": "93a3bc8a",
    "metadata": {},
-   "source": "## Combined preprocessing pipeline"
+   "source": [
+    "Example with a center outlier, and an outlier at the beginning of the series:"
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "42b10ae5",
    "metadata": {},
    "outputs": [],
-   "source": "raw = np.array([0.10, 0.12, 0.14, 0.16, 0.48, 0.20, 0.22])\nblank = np.full_like(raw, 0.02)\npath_length_cm = 0.5\n\nraw_1cm = gc.path_correct(raw, path_length_cm=path_length_cm)\nblank_1cm = gc.path_correct(blank, path_length_cm=path_length_cm)\nbaseline_corrected = gc.blank_subtraction(raw_1cm, blank_1cm)\noutlier_mask = prep.out_of_iqr(baseline_corrected, window_size=5, factor=1.5)\n\nprint(f'Raw OD ({path_length_cm} cm): {raw}')\nprint(f'Path-corrected OD (1 cm): {raw_1cm}')\nprint(f'Blank-subtracted OD: {baseline_corrected}')\nprint(f'Outlier mask: {outlier_mask}')"
+   "source": [
+    "od_series = np.array([0.08, 0.99, 0.14, 0.19, 0.23, 0.25, 0.95, 0.31, 0.36, 0.41])\n",
+    "mask = prep.detect_outliers(od_series, method=\"iqr\", window_size=5, factor=1.5)\n",
+    "\n",
+    "print(f\"OD series: {od_series}\")\n",
+    "print(f\"Outlier mask: {mask}\")\n",
+    "print(f\"Outlier indices: {np.where(mask)[0]}\")\n",
+    "print(f\"Outlier values: {od_series[mask]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4a168ed",
+   "metadata": {},
+   "source": [
+    "If several outliers are present at the start of a time series, IQR values need to be\n",
+    "calculated with a sufficiently large window, and maybe iteratively, to detect all\n",
+    "outliers (here the first value is not detected as an outlier as the second value\n",
+    "is included in the window and increases the IQR range)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4e416f91",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "od_series = np.array([0.99, 0.99, 0.14, 0.19, 0.23, 0.25, 0.95, 0.31, 0.36, 0.41])\n",
+    "mask = prep.detect_outliers(od_series, method=\"iqr\", window_size=5, factor=1.5)\n",
+    "\n",
+    "print(f\"OD series: {od_series}\")\n",
+    "print(f\"Outlier mask: {mask}\")\n",
+    "print(f\"Outlier indices: {np.where(mask)[0]}\")\n",
+    "print(f\"Outlier values: {od_series[mask]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Combined preprocessing pipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "raw = np.array([0.10, 0.12, 0.14, 0.16, 0.48, 0.20, 0.22])\n",
+    "blank = np.full_like(raw, 0.02)\n",
+    "path_length_cm = 0.5\n",
+    "\n",
+    "raw_1cm = gc.path_correct(raw, path_length_cm=path_length_cm)\n",
+    "blank_1cm = gc.path_correct(blank, path_length_cm=path_length_cm)\n",
+    "baseline_corrected = gc.blank_subtraction(raw_1cm, blank_1cm)\n",
+    "outlier_mask = prep.detect_outliers(\n",
+    "    baseline_corrected, method=\"iqr\", window_size=5, factor=1.5\n",
+    ")\n",
+    "\n",
+    "print(f\"Raw OD ({path_length_cm} cm): {raw}\")\n",
+    "print(f\"Path-corrected OD (1 cm): {raw_1cm}\")\n",
+    "print(f\"Blank-subtracted OD: {baseline_corrected}\")\n",
+    "print(f\"Outlier mask: {outlier_mask}\")"
+   ]
   }
  ],
  "metadata": {