Clarify HAC standard errors in docs and code

drbenvincent · drbenvincent · commit fc5e6ce5eabd · 2025-11-04T17:28:41.000Z
Expanded documentation and code comments to better explain HAC (Newey-West) standard errors, their purpose, and the hac_maxlags parameter. Added a detailed explanation and citation in the notebook, and improved docstrings and print output in transfer_function_its.py. Added the Newey-West reference to references.bib.
diff --git a/causalpy/experiments/transfer_function_its.py b/causalpy/experiments/transfer_function_its.py
@@ -293,7 +293,11 @@ def with_estimated_transforms(
         coef_constraint : str, default="nonnegative"
             Constraint on treatment coefficient ("nonnegative" or "unconstrained").
         hac_maxlags : int, optional
-            Maximum lags for HAC standard errors. If None, uses rule of thumb.
+            Maximum lags for HAC (Newey-West) standard errors, which correct for
+            autocorrelation and heteroskedasticity in residuals. Higher values account
+            for longer-range dependencies but reduce degrees of freedom. If None, uses
+            the Newey-West rule of thumb: floor(4*(n/100)^(2/9)). For example, with
+            n=104 observations, the default is hac_maxlags=4.
         **estimation_kwargs
             Additional keyword arguments for the estimation method:
 
@@ -936,7 +940,10 @@ def summary(self, round_to: Optional[int] = None) -> None:
         print(f"Outcome variable: {self.y_column}")
         print(f"Number of observations: {len(self.y)}")
         print(f"R-squared: {round_num(self.score, round_to)}")
-        print(f"HAC max lags: {self.hac_maxlags}")
+        print(
+            f"HAC max lags: {self.hac_maxlags} "
+            f"(robust SEs accounting for {self.hac_maxlags} periods of autocorrelation)"
+        )
         print("-" * 80)
         print("Baseline coefficients:")
         for label, coef, se in zip(
diff --git a/docs/source/notebooks/tfits_single_channel.ipynb b/docs/source/notebooks/tfits_single_channel.ipynb
@@ -30,7 +30,13 @@
     "Compare to related methods:\n",
     "- **Classic {term}`Interrupted Time Series`**: Binary on/off intervention (no dose-response modeling)\n",
     "- **{term}`Synthetic Control`**: Multiple control units available for comparison\n",
-    "- **{term}`Difference in Differences`**: Panel data with treatment/control groups\n",
+    "- **{term}`Difference in Differences`**: Panel data with treatment/control groups"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "\n",
     "## Example Scenario: Water Restrictions Policy\n",
     "\n",
@@ -57,13 +63,36 @@
    "source": [
     ":::{admonition} Implementation notes\n",
     ":class: warning\n",
-    "This notebook demonstrates the **MVP (non-Bayesian) implementation** using:\n",
+    "This notebook demonstrates the **non-Bayesian implementation** using:\n",
     "- OLS regression with HAC standard errors (fast, robust inference)\n",
-    "- User-specified transform parameters (future: parameter estimation)\n",
+    "- Automated transform parameter estimation via grid search or continuous optimization\n",
     "- Point estimates only (future: bootstrap confidence intervals, Bayesian uncertainty quantification)\n",
     ":::"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "::::{admonition} Understanding HAC Standard Errors\n",
+    ":class: note\n",
+    "\n",
+    "Time series data typically violates OLS assumptions because:\n",
+    "- **Autocorrelation**: Past values influence current values (e.g., yesterday's weather affects today's, habits persist over weeks)\n",
+    "- **Heteroskedasticity**: Variance changes over time (e.g., more volatility in certain seasons)\n",
+    "\n",
+    "When these violations occur, OLS **coefficient estimates remain unbiased**, but **standard errors are incorrect** — typically too small, leading to overconfident inference (narrow confidence intervals, artificially low p-values).\n",
+    "\n",
+    "**HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors** — also known as **Newey-West standard errors** {cite:p}`newey1987simple` — provide robust inference by correcting standard errors for these violations. This gives reliable confidence intervals and hypothesis tests even when residuals are correlated.\n",
+    "\n",
+    "**Key Parameter:**\n",
+    "- `hac_maxlags`: Controls how many periods of autocorrelation to account for. CausalPy uses the Newey-West rule of thumb: `floor(4*(n/100)^(2/9))`. For our 104-week dataset, this gives `hac_maxlags=4`, accounting for up to 4 weeks of residual correlation.\n",
+    "\n",
+    "**Tradeoff:** HAC standard errors are wider (more conservative) than naive OLS, but provide honest uncertainty quantification for time series data.\n",
+    "\n",
+    "::::\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -801,7 +830,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
@@ -878,6 +907,13 @@
     "    \"reduction in consumption.\"\n",
     ")"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
diff --git a/docs/source/references.bib b/docs/source/references.bib
@@ -204,3 +204,14 @@ @article{box1975intervention
   year={1975},
   publisher={Taylor \& Francis}
 }
+
+@article{newey1987simple,
+  title={A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix},
+  author={Newey, Whitney K and West, Kenneth D},
+  journal={Econometrica},
+  volume={55},
+  number={3},
+  pages={703--708},
+  year={1987},
+  publisher={JSTOR}
+}