Skip to content

Commit 8ad358e

Browse files
authored
Merge pull request #47 from biosustain/add-preprocessing-notebook
move preprocessing function demonstrations to a new notebook
2 parents c5e2237 + cb8907b commit 8ad358e

File tree

3 files changed

+120
-131
lines changed

3 files changed

+120
-131
lines changed

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
:hidden: true
1313
1414
tutorial/analysis
15+
tutorial/preprocessing
1516
tutorial/plotting
1617
```
1718

docs/tutorial/analysis.ipynb

Lines changed: 22 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,7 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": [
7-
"# Fit growth models and extract growth statistics\n",
8-
"\n",
9-
"This tutorial demonstrates how to fit growth models and extract growth statistics\n",
10-
"using the growthcurves package.\n",
11-
"\n",
12-
"The analysis workflow includes:\n",
13-
"1. Generating or loading growth data\n",
14-
"2. Fitting **mechanistic** models (ODE-based, parametric)\n",
15-
"3. Fitting **phenomenological** models (parametric and non-parametric)\n",
16-
"4. Extracting growth statistics from all fits\n",
17-
"5. Saving results for visualization\n",
18-
"\n",
19-
"For visualization of the results, see the companion notebook:\n",
20-
"[`plotting.ipynb`](plotting.ipynb) (Visualize fitted growth curves, derivatives,\n",
21-
" and growth statistics)"
22-
]
6+
"source": "# Fit growth models and extract growth statistics\n\nThis tutorial demonstrates how to fit growth models and extract growth statistics\nusing the growthcurves package.\n\nThe analysis workflow includes:\n1. Generating or loading growth data\n2. Fitting **mechanistic** models (ODE-based, parametric)\n3. Fitting **phenomenological** models (parametric and non-parametric)\n4. Extracting growth statistics from all fits\n5. Saving results for visualization\n\n\nFor preprocessing examples (blank subtraction, outlier detection, path length correction), see the companion notebook:\n[`preprocessing.ipynb`](preprocessing.ipynb).\n\nFor visualization of the results, see the companion notebook:\n[`plotting.ipynb`](plotting.ipynb) (Visualize fitted growth curves, derivatives,\n and growth statistics)"
237
},
248
{
259
"cell_type": "code",
@@ -35,99 +19,6 @@
3519
"import growthcurves as gc"
3620
]
3721
},
38-
{
39-
"cell_type": "markdown",
40-
"metadata": {},
41-
"source": [
42-
"## Data Preprocessing Functions\n",
43-
"\n",
44-
"The growthcurves package provides preprocessing utilities for common data corrections:\n",
45-
"\n",
46-
"- **`path_correct(N, path_length_cm)`**: Normalize OD measurements to 1 cm path length\n",
47-
"- **`blank_subtraction(N, blank)`**: Subtract blank/background measurements from data"
48-
]
49-
},
50-
{
51-
"cell_type": "code",
52-
"execution_count": 2,
53-
"metadata": {
54-
"tags": [
55-
"hide-input"
56-
]
57-
},
58-
"outputs": [
59-
{
60-
"name": "stdout",
61-
"output_type": "stream",
62-
"text": [
63-
"Path Length Correction Example:\n",
64-
" Raw OD (0.5 cm path): [0.25 0.3 0.35 0.4 ]\n",
65-
" Corrected OD (1 cm path): [0.5 0.6 0.7 0.8]\n",
66-
"\n",
67-
"Blank Subtraction Example:\n",
68-
" Sample OD: [0.5 0.6 0.7 0.8]\n",
69-
" Blank OD: [0.05 0.052 0.048 0.051]\n",
70-
" Corrected: [0.45 0.548 0.652 0.749]\n",
71-
"\n",
72-
"Combined Preprocessing Pipeline:\n",
73-
" Raw measurements (0.5 cm): [0.125 0.15 0.175 0.2 ]\n",
74-
" After path correction (1 cm): [0.25 0.3 0.35 0.4 ]\n",
75-
" Blank (corrected to 1 cm):[0.05 0.05 0.05 0.05]\n",
76-
" Final corrected OD: [0.2 0.25 0.3 0.35]\n",
77-
"\n"
78-
]
79-
}
80-
],
81-
"source": [
82-
"# Example 1: Path length correction\n",
83-
"# Measurements taken with a 0.5 cm path length, normalized to 1 cm\n",
84-
"raw_od_measurements = np.array([0.25, 0.30, 0.35, 0.40])\n",
85-
"path_length = 0.5 # cm\n",
86-
"\n",
87-
"od_corrected = gc.path_correct(raw_od_measurements, path_length)\n",
88-
"\n",
89-
"print(\"Path Length Correction Example:\")\n",
90-
"print(f\" Raw OD (0.5 cm path): {raw_od_measurements}\")\n",
91-
"print(f\" Corrected OD (1 cm path): {od_corrected}\")\n",
92-
"print()\n",
93-
"\n",
94-
"# Example 2: Blank subtraction\n",
95-
"# Typical workflow: subtract blank measurements from sample data\n",
96-
"sample_data = np.array([0.500, 0.600, 0.700, 0.800])\n",
97-
"blank_data = np.array([0.050, 0.052, 0.048, 0.051])\n",
98-
"\n",
99-
"corrected_data = gc.blank_subtraction(sample_data, blank_data)\n",
100-
"\n",
101-
"print(\"Blank Subtraction Example:\")\n",
102-
"print(f\" Sample OD: {sample_data}\")\n",
103-
"print(f\" Blank OD: {blank_data}\")\n",
104-
"print(f\" Corrected: {corrected_data}\")\n",
105-
"print()\n",
106-
"\n",
107-
"# Example 3: Combined preprocessing workflow\n",
108-
"# Simulate a typical preprocessing pipeline\n",
109-
"raw_measurements = np.array([0.125, 0.150, 0.175, 0.200])\n",
110-
"blank_measurements = np.array([0.025, 0.025, 0.025, 0.025])\n",
111-
"path_length_cm = 0.5\n",
112-
"\n",
113-
"# Step 1: Path correction\n",
114-
"od_1cm = gc.path_correct(raw_measurements, path_length_cm)\n",
115-
"\n",
116-
"# Step 2: Blank subtraction\n",
117-
"od_corrected = gc.blank_subtraction(\n",
118-
" od_1cm, gc.path_correct(blank_measurements, path_length_cm)\n",
119-
")\n",
120-
"\n",
121-
"print(\"Combined Preprocessing Pipeline:\")\n",
122-
"print(f\" Raw measurements (0.5 cm): {raw_measurements}\")\n",
123-
"print(f\" After path correction (1 cm): {od_1cm}\")\n",
124-
"print(\n",
125-
" f\" Blank (corrected to 1 cm):{gc.path_correct(blank_measurements, path_length_cm)}\"\n",
126-
")\n",
127-
"print(f\" Final corrected OD: {od_corrected}\")\n",
128-
"print()"
129-
]
130-
},
13122
{
13223
"cell_type": "markdown",
13324
"metadata": {},
@@ -137,7 +28,7 @@
13728
"This cell generates synthetic growth data from a clean logistic function.\n",
13829
"- time is modeled in hours, with measurements every 12 minutes (0.2 hours) for\n",
13930
" a total of 440 points (88 hours).\n",
140-
"- We assume a lag of 30 hours, an intrinsic growth rate of 0.15 hour⁻¹,\n",
31+
"- We assume a lag of 30 hours, an intrinsic growth rate of 0.15 hour\u207b\u00b9,\n",
14132
" and a carrying capacity of 0.45 OD."
14233
]
14334
},
@@ -297,11 +188,11 @@
297188
"| Output key | Meaning | How it is calculated |\n",
298189
"|---|---|---|\n",
299190
"| `max_od` | Maximum observed/fitted OD | Maximum OD over the valid data range |\n",
300-
"| `mu_max` | Maximum specific growth rate (μ_max) | Maximum of `d(ln N)/dt` from the fitted model (or local fit for non-parametric) |\n",
301-
"| `intrinsic_growth_rate` | Intrinsic model rate parameter | For mechanistic models: fitted intrinsic `μ`; for phenomenological/non-parametric: `None` |\n",
191+
"| `mu_max` | Maximum specific growth rate (\u03bc_max) | Maximum of `d(ln N)/dt` from the fitted model (or local fit for non-parametric) |\n",
192+
"| `intrinsic_growth_rate` | Intrinsic model rate parameter | For mechanistic models: fitted intrinsic `\u03bc`; for phenomenological/non-parametric: `None` |\n",
302193
"| `doubling_time` | Doubling time in hours | `ln(2) / mu_max` |\n",
303194
"| `time_at_umax` | Time at maximum specific growth | Time where `mu_max` reaches its maximum |\n",
304-
"| `od_at_umax` | OD at time of μ_max | Model-predicted OD at `time_at_umax` |\n",
195+
"| `od_at_umax` | OD at time of \u03bc_max | Model-predicted OD at `time_at_umax` |\n",
305196
"| `exp_phase_start`, `exp_phase_end` | Exponential phase boundaries | From threshold or tangent phase-boundary method in `extract_stats()` |\n",
306197
"| `model_rmse` | Fit error | RMSE between observed OD and model-predicted OD over the model fit window |\n",
307198
"\n",
@@ -319,15 +210,15 @@
319210
"The `extract_stats_from_fit()` function calculates these key metrics:\n",
320211
"\n",
321212
"- `max_od`: Maximum OD value within the fitted window\n",
322-
"- `mu_max`: **Observed** maximum specific growth rate μ_max (hour⁻¹) - calculated\n",
213+
"- `mu_max`: **Observed** maximum specific growth rate \u03bc_max (hour\u207b\u00b9) - calculated\n",
323214
" from the fitted curve\n",
324215
"- `intrinsic_growth_rate`: **Model parameter** for intrinsic growth rate\n",
325216
" (parametric models only, `None` for non-parametric)\n",
326217
"- `doubling_time`: Time to double the population at peak growth (hours)\n",
327218
"- `exp_phase_start`: When exponential phase begins (hours)\n",
328219
"- `exp_phase_end`: When exponential phase ends (hours)\n",
329-
"- `time_at_umax`: Time when μ reaches its maximum (hours)\n",
330-
"- `od_at_umax`: OD value at time of maximum μ\n",
220+
"- `time_at_umax`: Time when \u03bc reaches its maximum (hours)\n",
221+
"- `od_at_umax`: OD value at time of maximum \u03bc\n",
331222
"- `fit_t_min`: Start of fitting window (hours)\n",
332223
"- `fit_t_max`: End of fitting window (hours)\n",
333224
"- `fit_method`: Identifier for the method used\n",
@@ -339,29 +230,29 @@
339230
"\n",
340231
"### MECHANISTIC MODELS\n",
341232
"\n",
342-
"| Name | Model | Equation | Exp Start | Exp End | Intrinsic μ | μ max | Carrying Capacity | Fit |\n",
233+
"| Name | Model | Equation | Exp Start | Exp End | Intrinsic \u03bc | \u03bc max | Carrying Capacity | Fit |\n",
343234
"|------|-------|----------|-----------|---------|-------------|-------|-------------------|-----|\n",
344-
"| Logistic | parametric | `dN/dt = μ * (1 - N(t) / K) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | μ | max dln(N)/dt | K | entire curve |\n",
345-
"| Gompertz | parametric | `dN/dt = μ * math.log(K / N(t)) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | μ | max dln(N)/dt | K | entire curve |\n",
346-
"| Richards | parametric | `dN/dt = μ * (1 - (N(t) / K)**beta) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | μ | max dln(N)/dt | A | entire curve |\n",
347-
"| Baranyi | parametric | `dN/dt= μ * math.exp(μ * t) / (math.exp(h0) - 1 + math.exp(μ * t)) * (1 - N(t) / K) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | μ | max dln(N)/dt | K | entire curve |\n",
235+
"| Logistic | parametric | `dN/dt = \u03bc * (1 - N(t) / K) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | \u03bc | max dln(N)/dt | K | entire curve |\n",
236+
"| Gompertz | parametric | `dN/dt = \u03bc * math.log(K / N(t)) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | \u03bc | max dln(N)/dt | K | entire curve |\n",
237+
"| Richards | parametric | `dN/dt = \u03bc * (1 - (N(t) / K)**beta) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | \u03bc | max dln(N)/dt | A | entire curve |\n",
238+
"| Baranyi | parametric | `dN/dt= \u03bc * math.exp(\u03bc * t) / (math.exp(h0) - 1 + math.exp(\u03bc * t)) * (1 - N(t) / K) * N(t)` | threshold/<br>tangent | threshold/<br>tangent | \u03bc | max dln(N)/dt | K | entire curve |\n",
348239
"\n",
349240
"### PHENOMENOLOGICAL MODELS\n",
350241
"\n",
351-
"| Name | Model | Equation | Exp Start | Exp End | Intrinsic μ | μ max | Max OD | Fit |\n",
242+
"| Name | Model | Equation | Exp Start | Exp End | Intrinsic \u03bc | \u03bc max | Max OD | Fit |\n",
352243
"|------|-------|----------|-----------|---------|-------------|-------|--------|-----|\n",
353244
"| Linear | non-parametric | `ln(N(t)) = N0 + b * t` | threshold/<br>tangent | threshold/<br>tangent | n.a. | b | max OD raw | only window |\n",
354245
"| Spline | non-parametric | `ln(N(t)) = spline(t)` | threshold/<br>tangent | threshold/<br>tangent | n.a. | max of derivative of spline | max OD raw | only log phase |\n",
355-
"| Logistic (phenom) | parametric | `ln(N(t)/N0) = A / (1 + exp(4 * μ_max * (λ - t) / A + 2))` | λ | threshold/<br>tangent | n.a. | μ_max | K | entire curve |\n",
356-
"| Gompertz (phenom) | parametric | `ln(N(t)/N0) = A * exp(-exp(μ_max * exp(1) * (λ - t) / A + 1))` | λ | threshold/<br>tangent | n.a. | μ_max | K | entire curve |\n",
357-
"| Gompertz (modified) | parametric | `ln(N(t)/N0) = A * exp(-exp(μ_max * exp(1) * (λ - t) / A + 1)) + A * exp(α * (t - t_shift))` | λ | threshold/<br>tangent | n.a. | μ_max | K | entire curve |\n",
358-
"| Richards (phenom) | parametric | `ln(N(t)/N0) = A * (1 + ν * exp(1 + ν + μ_max * (1 + ν)**(1/ν) * (λ - t) / A))**(-1/ν)` | λ | threshold/<br>tangent | n.a. | μ_max | K | entire curve |\n",
246+
"| Logistic (phenom) | parametric | `ln(N(t)/N0) = A / (1 + exp(4 * \u03bc_max * (\u03bb - t) / A + 2))` | \u03bb | threshold/<br>tangent | n.a. | \u03bc_max | K | entire curve |\n",
247+
"| Gompertz (phenom) | parametric | `ln(N(t)/N0) = A * exp(-exp(\u03bc_max * exp(1) * (\u03bb - t) / A + 1))` | \u03bb | threshold/<br>tangent | n.a. | \u03bc_max | K | entire curve |\n",
248+
"| Gompertz (modified) | parametric | `ln(N(t)/N0) = A * exp(-exp(\u03bc_max * exp(1) * (\u03bb - t) / A + 1)) + A * exp(\u03b1 * (t - t_shift))` | \u03bb | threshold/<br>tangent | n.a. | \u03bc_max | K | entire curve |\n",
249+
"| Richards (phenom) | parametric | `ln(N(t)/N0) = A * (1 + \u03bd * exp(1 + \u03bd + \u03bc_max * (1 + \u03bd)**(1/\u03bd) * (\u03bb - t) / A))**(-1/\u03bd)` | \u03bb | threshold/<br>tangent | n.a. | \u03bc_max | K | entire curve |\n",
359250
"\n",
360251
"### Understanding Growth Rates: Intrinsic vs. Observed\n",
361252
"\n",
362253
"**Important distinction:**\n",
363254
"\n",
364-
"- **`mu_max`** (μ_max): The **observed** maximum specific growth rate calculated\n",
255+
"- **`mu_max`** (\u03bc_max): The **observed** maximum specific growth rate calculated\n",
365256
" from the fitted curve as max(d(ln N)/dt). This is what you measure from the data.\n",
366257
"\n",
367258
"- **`intrinsic_growth_rate`**: The **model parameter** representing intrinsic growth\n",
@@ -1014,9 +905,9 @@
1014905
"Two methods are available for determining exponential phase boundaries:\n",
1015906
"\n",
1016907
"### 1. **Threshold Method**\n",
1017-
"- Tracks the instantaneous specific growth rate μ(t)\n",
1018-
"- `exp_phase_start`: First time when μ exceeds a fraction of μ_max (default: 15%)\n",
1019-
"- `exp_phase_end`: First time after peak when μ drops below the threshold\n",
908+
"- Tracks the instantaneous specific growth rate \u03bc(t)\n",
909+
"- `exp_phase_start`: First time when \u03bc exceeds a fraction of \u03bc_max (default: 15%)\n",
910+
"- `exp_phase_end`: First time after peak when \u03bc drops below the threshold\n",
1020911
"\n",
1021912
"### 2. **Tangent Method**\n",
1022913
"- Constructs a tangent line in log space at the point of maximum growth rate\n",

docs/tutorial/preprocessing.ipynb

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": "# Preprocess growth data\n\nThis tutorial demonstrates the preprocessing functions in `growthcurves.preprocessing`:\n\n- **`path_correct(N, path_length_cm)`**\n- **`blank_subtraction(N, blank)`**\n- **`out_of_iqr_window(values, factor, position)`**\n- **`out_of_iqr(N, window_size, factor)`**\n\nUse this workflow before model fitting when measurements require optical corrections or outlier screening."
7+
},
8+
{
9+
"cell_type": "code",
10+
"execution_count": null,
11+
"metadata": {},
12+
"outputs": [],
13+
"source": "import numpy as np\n\nimport growthcurves as gc\nfrom growthcurves import preprocessing as prep"
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"metadata": {},
18+
"source": "## Path length correction"
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": null,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": "# Measurements taken at 0.5 cm path length\nraw_od = np.array([0.25, 0.30, 0.35, 0.40])\nod_1cm = gc.path_correct(raw_od, path_length_cm=0.5)\n\nprint(f'Raw OD (0.5 cm): {raw_od}')\nprint(f'Corrected OD (1.0 cm): {od_1cm}')"
26+
},
27+
{
28+
"cell_type": "markdown",
29+
"metadata": {},
30+
"source": "## Blank subtraction"
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"metadata": {},
36+
"outputs": [],
37+
"source": "sample_od = np.array([0.50, 0.60, 0.70, 0.80])\nblank_od = np.array([0.05, 0.052, 0.048, 0.051])\ncorrected_od = gc.blank_subtraction(sample_od, blank_od)\n\nprint(f'Sample OD: {sample_od}')\nprint(f'Blank OD: {blank_od}')\nprint(f'Corrected OD:{corrected_od}')"
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {},
42+
"source": "## Outlier detection in a single window"
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": "window = np.array([0.10, 0.12, 0.65, 0.11, 0.13])\ncenter_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position='center')\nfirst_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position='first')\nlast_is_outlier = prep.out_of_iqr_window(window, factor=1.5, position='last')\n\nprint(f'Window: {window}')\nprint(f'Center value outlier? {center_is_outlier}')\nprint(f'First value outlier? {first_is_outlier}')\nprint(f'Last value outlier? {last_is_outlier}')"
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": "## Outlier detection across a full time series"
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": null,
59+
"metadata": {},
60+
"outputs": [],
61+
"source": "od_series = np.array([0.08, 0.11, 0.14, 0.19, 0.23, 0.95, 0.31, 0.36, 0.41])\nmask = prep.out_of_iqr(od_series, window_size=5, factor=1.5)\n\nprint(f'OD series: {od_series}')\nprint(f'Outlier mask: {mask}')\nprint(f'Outlier indices: {np.where(mask)[0]}')\nprint(f'Outlier values: {od_series[mask]}')"
62+
},
63+
{
64+
"cell_type": "markdown",
65+
"metadata": {},
66+
"source": "## Combined preprocessing pipeline"
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {},
72+
"outputs": [],
73+
"source": "raw = np.array([0.10, 0.12, 0.14, 0.16, 0.48, 0.20, 0.22])\nblank = np.full_like(raw, 0.02)\npath_length_cm = 0.5\n\nraw_1cm = gc.path_correct(raw, path_length_cm=path_length_cm)\nblank_1cm = gc.path_correct(blank, path_length_cm=path_length_cm)\nbaseline_corrected = gc.blank_subtraction(raw_1cm, blank_1cm)\noutlier_mask = prep.out_of_iqr(baseline_corrected, window_size=5, factor=1.5)\n\nprint(f'Raw OD ({path_length_cm} cm): {raw}')\nprint(f'Path-corrected OD (1 cm): {raw_1cm}')\nprint(f'Blank-subtracted OD: {baseline_corrected}')\nprint(f'Outlier mask: {outlier_mask}')"
74+
}
75+
],
76+
"metadata": {
77+
"kernelspec": {
78+
"display_name": "growthcurves_env",
79+
"language": "python",
80+
"name": "python3"
81+
},
82+
"language_info": {
83+
"codemirror_mode": {
84+
"name": "ipython",
85+
"version": 3
86+
},
87+
"file_extension": ".py",
88+
"mimetype": "text/x-python",
89+
"name": "python",
90+
"nbconvert_exporter": "python",
91+
"pygments_lexer": "ipython3",
92+
"version": "3.11.14"
93+
}
94+
},
95+
"nbformat": 4,
96+
"nbformat_minor": 4
97+
}

0 commit comments

Comments
 (0)