pymc-labs
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 9 additions & 2 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 9 additions & 2 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 52 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 3 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎Makefile‎
Lines changed: 30 additions & 11 deletions b/‎Makefile‎
Lines changed: 30 additions & 11 deletions
diff --git a/‎causalpy/data/datasets.py‎
Lines changed: 17 additions & 4 deletions b/‎causalpy/data/datasets.py‎
Lines changed: 17 additions & 4 deletions
@@ -22,10 +22,10 @@ repos:
         exclude_types: [svg]
       - id: check-yaml
       - id: check-added-large-files
-        exclude: &exclude_pattern 'iv_weak_instruments.ipynb'
+        exclude: &exclude_pattern '(iv_weak_instruments|its_lift_test)\.ipynb'
         args: ["--maxkb=1500"]
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.14.1
+    rev: v0.14.4
     hooks:
       # Run the linter
       - id: ruff
@@ -48,3 +48,10 @@ repos:
         additional_dependencies:
           # Support pyproject.toml configuration
           - tomli
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.18.2
+    hooks:
+      - id: mypy
+        args: [--ignore-missing-imports]
+        files: ^causalpy/
+        additional_dependencies: [numpy>=1.20, pandas-stubs]
@@ -0,0 +1,52 @@
+# AGENTS
+
+## Testing preferences
+
+- Write all Python tests as `pytest` style functions, not unittest classes
+- Use descriptive function names starting with `test_`
+- Prefer fixtures over setup/teardown methods
+- Use assert statements directly, not self.assertEqual
+
+## Testing approach
+
+- Never create throwaway test scripts or ad hoc verification files
+- If you need to test functionality, write a proper test in the test suite
+- All tests go in the `causalpy/tests/` directory following the project structure
+- Tests should be runnable with the rest of the suite (`python -m pytest`)
+- Even for quick verification, write it as a real test that provides ongoing value
+- Preference should be given to integration tests, but unit tests are acceptable for core functionality to maintain high code coverage.
+- Tests should remain quick to run. Tests involving MCMC sampling with PyMC should use custom `sample_kwargs` to minimize the computational load.
+
+## Documentation
+
+- **Structure**: Notebooks (how-to examples) go in `docs/source/notebooks/`, knowledgebase (educational content) goes in `docs/source/knowledgebase/`
+- **Notebook naming**: Use pattern `{method}_{model}.ipynb` (e.g., `did_pymc.ipynb`, `rd_skl.ipynb`), organized by causal method
+- **MyST directives**: Use `:::{note}` and other MyST features for callouts and formatting
+- **Glossary linking**: Link to glossary terms (defined in `glossary.rst`) on first mention in a file:
+  - In Markdown files (`.md`, `.ipynb`): Use MyST syntax `{term}glossary term``
+  - In RST files (`.rst`): Use Sphinx syntax `:term:`glossary term``
+- **Cross-references**: For other cross-references in Markdown files, use MyST role syntax with curly braces (e.g., `{doc}path/to/doc`, `{ref}label-name`)
+- **Citations**: Use `references.bib` for citations, cite sources in example notebooks where possible. Include reference section at bottom of notebooks using `:::{bibliography}` directive with `:filter: docname in docnames`
+- **API documentation**: Auto-generated from docstrings via Sphinx autodoc, no manual API docs needed
+- **Build**: Use `make html` to build documentation
+- **Doctest**: Use `make doctest` to test that Python examples in doctests work
+
+## Code structure and style
+
+- **Experiment classes**: All experiment classes inherit from `BaseExperiment` in `causalpy/experiments/`. Must declare `supports_ols` and `supports_bayes` class attributes. Only implement abstract methods for supported model types (e.g., if only Bayesian is supported, implement `_bayesian_plot()` and `get_plot_data_bayesian()`; if only OLS is supported, implement `_ols_plot()` and `get_plot_data_ols()`)
+- **Model-agnostic design**: Experiment classes should work with both PyMC and scikit-learn models. Use `isinstance(self.model, PyMCModel)` vs `isinstance(self.model, RegressorMixin)` to dispatch to appropriate implementations
+- **Model classes**: PyMC models inherit from `PyMCModel` (extends `pm.Model`). Scikit-learn models use `RegressorMixin` and are made compatible via `create_causalpy_compatible_class()`. Common interface: `fit()`, `predict()`, `score()`, `calculate_impact()`, `print_coefficients()`
+- **Data handling**: PyMC models use `xarray.DataArray` with coords (keys like "coeffs", "obs_ind", "treated_units"). Scikit-learn models use numpy arrays. Data index should be named "obs_ind"
+- **Formulas**: Use patsy for formula parsing (via `dmatrices()`)
+- **Custom exceptions**: Use project-specific exceptions from `causalpy.custom_exceptions`: `FormulaException`, `DataException`, `BadIndexException`
+- **File organization**: Experiments in `causalpy/experiments/`, PyMC models in `causalpy/pymc_models.py`, scikit-learn models in `causalpy/skl_models.py`
+
+## Type Checking
+
+- **Tool**: MyPy
+- **Configuration**: Integrated as a pre-commit hook.
+- **Scope**: Checks Python files within the `causalpy/` directory.
+- **Settings**:
+    - `ignore-missing-imports`: Enabled to allow for gradual adoption of type hints without requiring all third-party libraries to have stubs.
+    - `additional_dependencies`: Includes `numpy` and `pandas-stubs` to provide type information for these libraries.
+- **Execution**: Run automatically via `pre-commit run --all-files` or on commit.
@@ -17,6 +17,9 @@ We appreciate being notified of problems with the existing CausalPy code. We pre
 
 Please verify that your issue is not being currently addressed by other issues or pull requests by using the GitHub search tool to look for key words in the project issue tracker.
 
+## Use of agents
+PR's with agent-generated code are fine. But don't spam us with code you don't understand. See [AGENTS.md](./AGENTS.md) for how we use LLMs in this repo.
+
 ## Contributing code via pull requests
 
 While issue reporting is valuable, we strongly encourage users who are inclined to do so to submit patches for new or existing issues via pull requests. This is particularly the case for simple fixes, such as typos or tweaks to documentation, which do not require a heavy investment of time and attention.
 
@@ -1,31 +1,50 @@
-.PHONY: init lint check_lint test uml html cleandocs doctest
+#################################################################################
+# GLOBALS                                                                       #
+#################################################################################
 
-init:
+PACKAGE_DIR = causalpy
+
+#################################################################################
+# COMMANDS                                                                      #
+#################################################################################
+
+.PHONY: init lint check_lint test uml html cleandocs doctest help
+
+init: ## Install the package in editable mode
 	python -m pip install -e . --no-deps
 
-lint:
+lint: ## Run ruff linter and formatter
 	ruff check --fix .
 	ruff format .
 
-check_lint:
+check_lint: ## Check code formatting and linting without making changes
 	ruff check .
 	ruff format --diff --check .
 	interrogate .
 
-doctest:
+doctest: ## Run doctests for the causalpy module
 	python -m pytest --doctest-modules --ignore=causalpy/tests/ causalpy/ --config-file=causalpy/tests/conftest.py
 
-test:
+test: ## Run all tests with pytest
 	python -m pytest
 
-uml:
+uml: ## Generate UML diagrams from code
 	pyreverse -o png causalpy --output-directory docs/source/_static --ignore tests
 
-# Docs build commands
-
-html:
+html: ## Build HTML documentation with Sphinx
 	sphinx-build -b html docs/source docs/_build
 
-cleandocs:
+cleandocs: ## Clean the documentation build directories
 	rm -rf docs/_build
 	rm -rf docs/source/api/generated
+
+
+#################################################################################
+# Self Documenting Commands                                                     #
+#################################################################################
+
+.DEFAULT_GOAL := help
+
+help: ## Show this help message
+	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
+	awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'
@@ -43,15 +43,28 @@
 }
 
 
-def _get_data_home() -> pathlib.PosixPath:
+def _get_data_home() -> pathlib.Path:
     """Return the path of the data directory"""
     return pathlib.Path(cp.__file__).parents[1] / "causalpy" / "data"
 
 
-def load_data(dataset: str = None) -> pd.DataFrame:
-    """Loads the requested dataset and returns a pandas DataFrame.
+def load_data(dataset: str | None = None) -> pd.DataFrame:
+    """Load the requested dataset and return a pandas DataFrame.
 
-    :param dataset: The desired dataset to load
+    Parameters
+    ----------
+    dataset : str, optional
+        The desired dataset to load. If None, raises ValueError.
+
+    Returns
+    -------
+    pd.DataFrame
+        The loaded dataset as a pandas DataFrame.
+
+    Raises
+    ------
+    ValueError
+        If the requested dataset is not found.
     """
 
     if dataset in DATASETS: