first stab at demo of optimization

drbenvincent · drbenvincent · commit 97e5bc93eaf8 · 2025-11-10T09:47:01.000Z
diff --git a/docs/source/notebooks/graded_intervention_time_series_single_channel_ols.ipynb b/docs/source/notebooks/graded_intervention_time_series_single_channel_ols.ipynb
@@ -1762,6 +1762,235 @@
     "**Bottom Line:** HAC is the recommended default for most applications. Use ARIMAX only when you have strong evidence for a specific ARIMA structure and are comfortable with the added complexity and assumptions.\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Continuous Optimization for Parameter Estimation\n",
+    "\n",
+    "So far we've used **grid search** (`estimation_method=\"grid\"`) to estimate transform parameters by evaluating discrete parameter combinations. CausalPy also supports **continuous optimization** (`estimation_method=\"optimize\"`) which can explore the full continuous parameter space using gradient-based methods.\n",
+    "\n",
+    "**Advantages of optimization:**\n",
+    "- Explores continuous parameter space (not limited to grid points)\n",
+    "- Can find more precise parameter estimates\n",
+    "- Often faster for fine-grained search (doesn't evaluate all combinations)\n",
+    "- Better suited when you have good initial guesses\n",
+    "\n",
+    "**Tradeoffs:**\n",
+    "- May converge to local optima (depends on starting point)\n",
+    "- Less exhaustive than grid search (might miss global optimum if poorly initialized)\n",
+    "- Uses scipy.optimize.minimize with L-BFGS-B method\n",
+    "\n",
+    "We'll demonstrate optimization using the ARIMAX error model and compare parameter recovery against grid search.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_arimax_opt = cp.skl_models.TransferFunctionOLS(\n",
+    "    saturation_type=None,  # No saturation - adstock only\n",
+    "    adstock_bounds={\n",
+    "        \"half_life\": (0.5, 3.0),  # Continuous range (same as grid: 0.5 to 3.0)\n",
+    "    },\n",
+    "    estimation_method=\"optimize\",  # Continuous optimization\n",
+    "    error_model=\"arimax\",\n",
+    "    arima_order=(1, 0, 0),\n",
+    ")\n",
+    "\n",
+    "result_arimax_opt = cp.GradedInterventionTimeSeries(\n",
+    "    data=df,\n",
+    "    y_column=\"water_consumption\",\n",
+    "    treatment_names=[\"comm_intensity\"],\n",
+    "    base_formula=\"1 + t + temperature + rainfall\",\n",
+    "    model=model_arimax_opt,\n",
+    ")\n",
+    "\n",
+    "print(\"Optimization complete!\")\n",
+    "print(f\"Best RMSE: {result_arimax_opt.transform_estimation_results['best_score']:.2f}\")\n",
+    "print(\n",
+    "    f\"Estimated parameters: {result_arimax_opt.transform_estimation_results['best_params']}\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Compare Transform Parameter Recovery\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Extract estimated parameters\n",
+    "# Grid search\n",
+    "half_life_grid = result_arimax.transform_estimation_results[\"best_params\"][\"half_life\"]\n",
+    "rmse_grid = result_arimax.transform_estimation_results[\"best_score\"]\n",
+    "\n",
+    "# Optimization\n",
+    "half_life_opt = result_arimax_opt.transform_estimation_results[\"best_params\"][\n",
+    "    \"half_life\"\n",
+    "]\n",
+    "rmse_opt = result_arimax_opt.transform_estimation_results[\"best_score\"]\n",
+    "\n",
+    "# True value\n",
+    "half_life_true = 1.5\n",
+    "\n",
+    "# Create comparison table\n",
+    "comparison_data = {\n",
+    "    \"Method\": [\"True Value\", \"ARIMAX Grid\", \"ARIMAX Optimize\"],\n",
+    "    \"Half-life\": [\n",
+    "        f\"{half_life_true:.3f}\",\n",
+    "        f\"{half_life_grid:.3f}\",\n",
+    "        f\"{half_life_opt:.3f}\",\n",
+    "    ],\n",
+    "    \"Error\": [\n",
+    "        \"-\",\n",
+    "        f\"{abs(half_life_grid - half_life_true):.3f}\",\n",
+    "        f\"{abs(half_life_opt - half_life_true):.3f}\",\n",
+    "    ],\n",
+    "    \"RMSE\": [\"-\", f\"{rmse_grid:.2f}\", f\"{rmse_opt:.2f}\"],\n",
+    "}\n",
+    "\n",
+    "param_comparison_df = pd.DataFrame(comparison_data)\n",
+    "\n",
+    "print(\"=\" * 70)\n",
+    "print(\"PARAMETER RECOVERY: GRID vs OPTIMIZATION\")\n",
+    "print(\"=\" * 70)\n",
+    "print(param_comparison_df.to_string(index=False))\n",
+    "print(\"=\" * 70)\n",
+    "print()\n",
+    "print(\"KEY OBSERVATIONS:\")\n",
+    "print(f\"• True half-life: {half_life_true:.3f} weeks\")\n",
+    "print(\n",
+    "    f\"• Grid search estimate: {half_life_grid:.3f} (error: {abs(half_life_grid - half_life_true):.3f})\"\n",
+    ")\n",
+    "print(\n",
+    "    f\"• Optimization estimate: {half_life_opt:.3f} (error: {abs(half_life_opt - half_life_true):.3f})\"\n",
+    ")\n",
+    "print(\n",
+    "    f\"• RMSE improvement: {rmse_grid - rmse_opt:.2f} ({(1 - rmse_opt / rmse_grid) * 100:.2f}%)\"\n",
+    ")\n",
+    "if abs(half_life_opt - half_life_true) < abs(half_life_grid - half_life_true):\n",
+    "    print(\"✓ Optimization achieved better parameter recovery\")\n",
+    "else:\n",
+    "    print(\"• Grid search achieved comparable or better parameter recovery\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualize adstock function comparison\n",
+    "fig, ax = plt.subplots(1, 1, figsize=(10, 5))\n",
+    "\n",
+    "# Get adstock objects\n",
+    "adstock_true = result_arimax.treatments[0].adstock  # Just to get structure\n",
+    "adstock_grid = result_arimax.treatments[0].adstock\n",
+    "adstock_opt = result_arimax_opt.treatments[0].adstock\n",
+    "\n",
+    "# Calculate weights\n",
+    "l_max = 8\n",
+    "lags = np.arange(l_max + 1)\n",
+    "\n",
+    "# True weights\n",
+    "alpha_true = np.power(0.5, 1 / half_life_true)\n",
+    "weights_true = alpha_true**lags\n",
+    "weights_true = weights_true / weights_true.sum()\n",
+    "\n",
+    "# Grid weights\n",
+    "alpha_grid = np.power(0.5, 1 / half_life_grid)\n",
+    "weights_grid = alpha_grid**lags\n",
+    "weights_grid = weights_grid / weights_grid.sum()\n",
+    "\n",
+    "# Optimize weights\n",
+    "alpha_opt = np.power(0.5, 1 / half_life_opt)\n",
+    "weights_opt = alpha_opt**lags\n",
+    "weights_opt = weights_opt / weights_opt.sum()\n",
+    "\n",
+    "# Plot\n",
+    "width = 0.25\n",
+    "ax.bar(\n",
+    "    lags - width,\n",
+    "    weights_true,\n",
+    "    width,\n",
+    "    alpha=0.8,\n",
+    "    label=f\"True (half-life={half_life_true:.2f})\",\n",
+    "    color=\"black\",\n",
+    ")\n",
+    "ax.bar(\n",
+    "    lags,\n",
+    "    weights_grid,\n",
+    "    width,\n",
+    "    alpha=0.8,\n",
+    "    label=f\"Grid (half-life={half_life_grid:.2f})\",\n",
+    "    color=\"C0\",\n",
+    ")\n",
+    "ax.bar(\n",
+    "    lags + width,\n",
+    "    weights_opt,\n",
+    "    width,\n",
+    "    alpha=0.8,\n",
+    "    label=f\"Optimize (half-life={half_life_opt:.2f})\",\n",
+    "    color=\"C2\",\n",
+    ")\n",
+    "\n",
+    "ax.set_xlabel(\"Lag (periods)\", fontsize=11)\n",
+    "ax.set_ylabel(\"Adstock Weight\", fontsize=11)\n",
+    "ax.set_title(\n",
+    "    \"Adstock Parameter Recovery: Grid vs Optimization\", fontsize=12, fontweight=\"bold\"\n",
+    ")\n",
+    "ax.legend(fontsize=10, framealpha=0.9)\n",
+    "ax.grid(True, alpha=0.3, axis=\"y\")\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "print(\"\\n📊 INTERPRETATION:\")\n",
+    "print(\n",
+    "    \"Continuous optimization can find parameter values between grid points, potentially\"\n",
+    ")\n",
+    "print(\n",
+    "    \"achieving better fit (lower RMSE) and more accurate parameter recovery. The tradeoff\"\n",
+    ")\n",
+    "print(\n",
+    "    \"is that optimization may find local optima, while grid search exhaustively evaluates\"\n",
+    ")\n",
+    "print(\n",
+    "    \"all specified combinations. For this example, optimization explores the continuous\"\n",
+    ")\n",
+    "print(\"range [0.5, 3.0] rather than being limited to 30 discrete grid points.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Summary: Grid Search vs Optimization\n",
+    "\n",
+    "**When to use grid search:**\n",
+    "- You want exhaustive evaluation of discrete parameter combinations\n",
+    "- Parameter space is small enough to evaluate densely\n",
+    "- You want to visualize the full search landscape\n",
+    "- Robustness to local optima is critical\n",
+    "\n",
+    "**When to use optimization:**\n",
+    "- You want fine-grained continuous parameter estimates\n",
+    "- Parameter space is large (many parameters or wide ranges)\n",
+    "- You have good intuition for reasonable parameter ranges\n",
+    "- Computational efficiency matters for large datasets\n",
+    "\n",
+    "**Best practice:** Start with coarse grid search to understand the landscape, then use optimization to refine estimates if needed.\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},