minor restructure of notebook for clarity, readability, and flow

drbenvincent · drbenvincent · commit bfce5c677503 · 2025-11-06T15:50:07.000Z
diff --git a/docs/source/notebooks/graded_intervention_time_series_single_channel_ols.ipynb b/docs/source/notebooks/graded_intervention_time_series_single_channel_ols.ipynb
@@ -35,6 +35,8 @@
     "\n",
     "### The Autocorrelation Challenge\n",
     "\n",
+    "#### Introduction: Understanding the Problem\n",
+    "\n",
     "**Autocorrelation** occurs when observations in a time series are correlated with their own past values. In causal inference with time series data, this creates a fundamental challenge:\n",
     "\n",
     "**What is autocorrelation?**\n",
@@ -66,50 +68,69 @@
     "4. **Bayesian time series models**: Full posterior inference with temporal dependencies\n",
     "5. **Bootstrap methods**: Resample with preserved temporal structure\n",
     "\n",
-    "**This implementation uses HAC standard errors** because they offer an excellent balance of:\n",
+    "This implementation provides both **HAC** and **ARIMAX** approaches, each with distinct advantages for different use cases.\n",
+    "\n",
+    "#### Approach 1: HAC Standard Errors (Default)\n",
+    "\n",
+    "**HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors** — also known as **Newey-West standard errors** {cite:p}`newey1987simple` — provide robust inference by correcting standard errors without requiring specification of the autocorrelation structure.\n",
+    "\n",
+    "**Advantages:**\n",
     "- **Simplicity**: No need to specify autocorrelation structure (order of AR/MA terms)\n",
     "- **Robustness**: Works with any autocorrelation pattern (not just AR or MA)\n",
     "- **Computational efficiency**: Fast OLS with corrected standard errors\n",
-    "- **Proven reliability**: Well-established method with strong theoretical properties {cite:p}`newey1987simple`\n",
+    "- **Proven reliability**: Well-established method with strong theoretical properties\n",
     "\n",
     "With HAC (see detailed explanation in the admonition box below):\n",
     "- ✅ **Causal estimates remain valid**: Treatment effect coefficients are unbiased\n",
     "- ✅ **Inference is corrected**: Standard errors, confidence intervals, and p-values account for autocorrelation\n",
     "- ✅ **No model specification required**: Don't need to guess AR order or lag structure\n",
     "- ✅ **Honest uncertainty quantification**: Confidence intervals reflect true uncertainty\n",
     "\n",
-    "The tradeoff: HAC standard errors are **wider** (more conservative) than naive OLS, but they provide **trustworthy inference** even when residuals show complex autocorrelation patterns.\n",
+    "**Tradeoff:** HAC standard errors are **wider** (more conservative) than naive OLS, but they provide **trustworthy inference** even when residuals show complex autocorrelation patterns.\n",
+    "\n",
+    "**This notebook demonstrates HAC inference** in the main analysis sections, showing how it compares to naive OLS and why it matters for valid causal inference.\n",
+    "\n",
+    "::::{admonition} Understanding HAC Standard Errors\n",
+    ":class: note\n",
+    "\n",
+    "Time series data typically violates OLS assumptions because:\n",
+    "- **Autocorrelation**: Past values influence current values (e.g., yesterday's weather affects today's, habits persist over weeks)\n",
+    "- **Heteroskedasticity**: Variance changes over time (e.g., more volatility in certain seasons)\n",
+    "\n",
+    "When these violations occur, OLS **coefficient estimates remain unbiased**, but **standard errors are incorrect** — typically too small, leading to overconfident inference (narrow confidence intervals, artificially low p-values).\n",
+    "\n",
+    "**HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors** — also known as **Newey-West standard errors** {cite:p}`newey1987simple` — provide robust inference by correcting standard errors for these violations. This gives reliable confidence intervals and hypothesis tests even when residuals are correlated.\n",
+    "\n",
+    "**Key Parameter:**\n",
+    "- `hac_maxlags`: Controls how many periods of autocorrelation to account for. CausalPy uses the Newey-West rule of thumb: `floor(4*(n/100)^(2/9))`. For our 104-week dataset, this gives `hac_maxlags=4`, accounting for up to 4 weeks of residual correlation.\n",
+    "\n",
+    "**Tradeoff:** HAC standard errors are wider (more conservative) than naive OLS, but provide honest uncertainty quantification for time series data.\n",
+    "::::\n",
+    "\n",
+    "#### Approach 2: ARIMAX Models\n",
+    "\n",
+    "**ARIMAX (ARIMA with eXogenous variables)** explicitly models the autocorrelation structure of residuals using ARIMA(p,d,q) processes, following the classical Box & Tiao (1975) intervention analysis framework {cite:p}`box1975intervention`.\n",
+    "\n",
+    "**Advantages:**\n",
+    "- **Efficiency**: Smaller standard errors when ARIMA structure is correctly specified\n",
+    "- **Classical methodology**: Follows the original intervention analysis approach\n",
+    "- **Explicit error modeling**: Can characterize and forecast residual dynamics\n",
     "\n",
-    "**Future extensions** could add ARIMAX or Bayesian models for explicit temporal structure modeling, but HAC provides an excellent starting point for practical causal inference with time series data.\n",
+    "**Tradeoffs:**\n",
+    "- **Requires specification**: Must choose p, d, q orders (typically via ACF/PACF plots)\n",
+    "- **Misspecification risk**: Wrong orders can lead to biased or inefficient inference\n",
+    "- **Less robust**: More sensitive to outliers and structural breaks\n",
     "\n",
-    "This notebook demonstrates why autocorrelation-robust inference matters by:\n",
-    "1. Simulating data with realistic autocorrelation (AR(1) errors)\n",
-    "2. Fitting the TF-ITS model with HAC standard errors\n",
-    "3. Comparing naive vs HAC inference to show the concrete difference\n",
-    "\n"
+    "**Section 5 of this notebook demonstrates ARIMAX** as an alternative error model, comparing it to HAC and providing guidance on when to use each approach."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Example Scenario: Water Restrictions Policy\n",
-    "\n",
-    "A regional water authority in a dry climate implements a **drought-responsive communications policy**. Communication intensity (0-10 scale) varies based on cumulative rainfall deficit over the past 6 weeks. During sustained drought, messaging ramps up to encourage conservation. Most of the time, communications are zero (no routine messaging).\n",
-    "\n",
-    "Why this example demonstrates TF-ITS strengths:\n",
-    "- **Graded intervention**: Communication intensity varies from 0-10, not on/off\n",
-    "- **Sparse activation**: Policy activates only during drought (realistic, cost-effective)\n",
-    "- **Saturation**: Repeated messages have diminishing returns as people become desensitized\n",
-    "- **Adstock**: Behavioral habits change gradually and persist over weeks\n",
-    "- **Confounders**: Temperature and rainfall directly affect water consumption and must be controlled\n",
+    "## Notebook Overview\n",
     "\n",
-    "While we use water policy, this method applies to any domain with graded interventions and carryover effects:\n",
-    "- Public health campaigns (vaccination messaging, smoking cessation)\n",
-    "- Marketing mix modeling (advertising spend, promotions)  \n",
-    "- Environmental policy (emissions reduction programs)\n",
-    "- Traffic management (congestion pricing communications)\n",
-    "- Education interventions (remediation program intensity)"
+    "This notebook demonstrates Graded Intervention Time Series (Transfer Function ITS) analysis using a simulated water consumption dataset. We'll walk through data simulation, model fitting with transform parameter estimation, diagnostic checks, counterfactual analysis, and a comparison of different approaches to handling autocorrelation in time series data (HAC vs ARIMAX error models)."
    ]
   },
   {
@@ -119,35 +140,12 @@
     ":::{admonition} Implementation notes\n",
     ":class: warning\n",
     "This notebook demonstrates the **non-Bayesian implementation** using:\n",
-    "- OLS regression with HAC standard errors (fast, robust inference)\n",
+    "- OLS regression first with with HAC standard errors (fast, robust inference), then with ARIMAX.\n",
     "- Automated transform parameter estimation via grid search or continuous optimization\n",
     "- Point estimates only (future: bootstrap confidence intervals, Bayesian uncertainty quantification)\n",
     ":::"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "::::{admonition} Understanding HAC Standard Errors\n",
-    ":class: note\n",
-    "\n",
-    "Time series data typically violates OLS assumptions because:\n",
-    "- **Autocorrelation**: Past values influence current values (e.g., yesterday's weather affects today's, habits persist over weeks)\n",
-    "- **Heteroskedasticity**: Variance changes over time (e.g., more volatility in certain seasons)\n",
-    "\n",
-    "When these violations occur, OLS **coefficient estimates remain unbiased**, but **standard errors are incorrect** — typically too small, leading to overconfident inference (narrow confidence intervals, artificially low p-values).\n",
-    "\n",
-    "**HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors** — also known as **Newey-West standard errors** {cite:p}`newey1987simple` — provide robust inference by correcting standard errors for these violations. This gives reliable confidence intervals and hypothesis tests even when residuals are correlated.\n",
-    "\n",
-    "**Key Parameter:**\n",
-    "- `hac_maxlags`: Controls how many periods of autocorrelation to account for. CausalPy uses the Newey-West rule of thumb: `floor(4*(n/100)^(2/9))`. For our 104-week dataset, this gives `hac_maxlags=4`, accounting for up to 4 weeks of residual correlation.\n",
-    "\n",
-    "**Tradeoff:** HAC standard errors are wider (more conservative) than naive OLS, but provide honest uncertainty quantification for time series data.\n",
-    "\n",
-    "::::\n"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -176,6 +174,29 @@
     "%config InlineBackend.figure_format = 'retina'"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example Scenario: Water Restrictions Policy\n",
+    "\n",
+    "A regional water authority in a dry climate implements a **drought-responsive communications policy**. Communication intensity (0-10 scale) varies based on cumulative rainfall deficit over the past 6 weeks. During sustained drought, messaging ramps up to encourage conservation. Most of the time, communications are zero (no routine messaging).\n",
+    "\n",
+    "Why this example demonstrates TF-ITS strengths:\n",
+    "- **Graded intervention**: Communication intensity varies from 0-10, not on/off\n",
+    "- **Sparse activation**: Policy activates only during drought (realistic, cost-effective)\n",
+    "- **Saturation**: Repeated messages have diminishing returns as people become desensitized\n",
+    "- **Adstock**: Behavioral habits change gradually and persist over weeks\n",
+    "- **Confounders**: Temperature and rainfall directly affect water consumption and must be controlled\n",
+    "\n",
+    "While we use water policy, this method applies to any domain with graded interventions and carryover effects:\n",
+    "- Public health campaigns (vaccination messaging, smoking cessation)\n",
+    "- Marketing mix modeling (advertising spend, promotions)  \n",
+    "- Environmental policy (emissions reduction programs)\n",
+    "- Traffic management (congestion pricing communications)\n",
+    "- Education interventions (remediation program intensity)\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -374,8 +395,6 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2. Visualize the Data\n",
-    "\n",
     "Let's look at the water consumption and communication intensity time series. Notice:\n",
     "- **Very dry climate** with extended zero-rainfall periods in summer\n",
     "- **Communications are zero most of the time** - only activated during sustained drought\n",
@@ -471,7 +490,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3. Model Fitting\n",
+    "## 2. Model Fitting\n",
     "\n",
     "Fitting a transfer function model involves finding both the optimal transform parameters and the regression coefficients. This is accomplished through a nested optimization procedure. In the outer loop, the algorithm searches for the best saturation and adstock parameters—either by exhaustively evaluating all combinations on a discrete grid, or by using continuous optimization to search more efficiently through the parameter space. For each candidate set of transform parameters, the inner loop applies these transformations to the raw treatment variable and fits a regression model (OLS or ARIMAX) to the data. The root mean squared error (RMSE) of each fitted model is computed, and the parameter combination that minimizes this error is selected.\n",
     "\n",
@@ -533,7 +552,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 4. Visualize Estimated vs True Transform Parameters\n",
+    "## 3. Visualize Estimated vs True Transform Parameters\n",
     "\n",
     "Since we know the true parameters used to generate the data, we can compare the estimated transforms to the true transforms. This helps us assess **parameter recovery** - how well the estimation procedure identifies the true data-generating process.\n",
     "\n",
@@ -626,11 +645,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 5. Model Methods and Diagnostics\n",
+    "## 4. Model Methods and Diagnostics\n",
     "\n",
     "Now that we have a fitted model with estimated transforms, let's explore the available methods for analysis and diagnostics.\n",
     "\n",
-    "### 5.1 Model Summary\n",
+    "### Model Summary\n",
     "\n",
     "View the fitted model coefficients and their **HAC standard errors** (robust to autocorrelation and heteroskedasticity):\n"
    ]
@@ -674,7 +693,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.2 Model Fit Visualization\n",
+    "### Model Fit Visualization\n",
     "\n",
     "Plot observed vs fitted values and residuals:\n"
    ]
@@ -710,7 +729,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.3 Residual Diagnostics\n",
+    "### Residual Diagnostics\n",
     "\n",
     "Check for autocorrelation in residuals using ACF/PACF plots and Ljung-Box test. \n",
     "\n",
@@ -769,7 +788,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.4 Why HAC Standard Errors Matter\n",
+    "### Why HAC Standard Errors Matter\n",
     "\n",
     "The diagnostics above show **significant residual autocorrelation** (Ljung-Box test p-values < 0.05). This is **expected and realistic** even with a well-specified model! \n",
     "\n",
@@ -998,7 +1017,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.5 Impulse Response Function\n",
+    "### Impulse Response Function\n",
     "\n",
     "Visualize how communication effects persist over time through the adstock transformation:\n"
    ]
@@ -1034,7 +1053,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.6 Counterfactual Effect Estimation\n",
+    "### Counterfactual Effect Estimation\n",
     "\n",
     "Estimate the effect of the communications policy by comparing observed outcomes to a counterfactual where communications were never implemented:\n"
    ]
@@ -1076,7 +1095,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 6. Alternative Error Model: ARIMAX\n",
+    "## 5. Alternative Error Model: ARIMAX\n",
     "\n",
     "So far we've used **HAC (Newey-West) standard errors**, which provide robust inference without requiring us to specify the autocorrelation structure. This is the recommended default approach.\n",
     "\n",
@@ -1104,7 +1123,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.1 Fit Model with ARIMAX Errors\n",
+    "### Fit Model with ARIMAX Errors\n",
     "\n",
     "Since we generated the data with AR(1) errors (`rho_error = 0.5`), we know the true error structure is ARIMA(1,0,0). Let's fit the model with ARIMAX using this specification:\n"
    ]
@@ -1194,7 +1213,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.2 Model Fit Visualization\n",
+    "### Model Fit Visualization\n",
     "\n",
     "Let's visualize the ARIMAX model fit to see how well it captures the data patterns:\n"
    ]
@@ -1236,7 +1255,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.3 Residual Diagnostics\n",
+    "### Residual Diagnostics\n",
     "\n",
     "A key advantage of ARIMAX is that by explicitly modeling the autocorrelation structure, the residuals should exhibit less autocorrelation. Let's check this with ACF/PACF plots and the Ljung-Box test:\n"
    ]
@@ -1297,7 +1316,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.4 Impulse Response Function\n",
+    "### Impulse Response Function\n",
     "\n",
     "The impulse response function visualizes how a one-unit increase in communication intensity affects water consumption dynamically over time, accounting for the adstock effect:\n"
    ]
@@ -1339,7 +1358,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.5 Counterfactual Effect Estimation\n",
+    "### Counterfactual Effect Estimation\n",
     "\n",
     "We can estimate the total causal effect of the communications policy by comparing observed outcomes to a counterfactual scenario where communications were never implemented:\n"
    ]
@@ -1455,9 +1474,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.2 Comparison: HAC vs ARIMAX\n",
+    "## Comparison: HAC vs ARIMAX\n",
     "\n",
-    "Let's compare the two approaches side-by-side to understand their differences:\n"
+    "Let's compare the two approaches side-by-side to understand their differences:"
    ]
   },
   {
@@ -1644,7 +1663,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6.3 Decision Guide: Which Error Model to Use?\n",
+    "### Decision Guide: Which Error Model to Use?\n",
     "\n",
     "Here's a practical guide for choosing between HAC and ARIMAX:\n",
     "\n",