Skip to content

Commit 2a01933

Browse files
committed
Merge branch 'main' into vs_module
Signed-off-by: Nathaniel <[email protected]>
2 parents 1f4f695 + efde473 commit 2a01933

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+15199
-2723
lines changed

.pre-commit-config.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ repos:
2222
exclude_types: [svg]
2323
- id: check-yaml
2424
- id: check-added-large-files
25-
exclude: &exclude_pattern 'iv_weak_instruments.ipynb'
25+
exclude: &exclude_pattern '(iv_weak_instruments|its_lift_test)\.ipynb'
2626
args: ["--maxkb=1500"]
2727
- repo: https://github.com/astral-sh/ruff-pre-commit
28-
rev: v0.14.1
28+
rev: v0.14.4
2929
hooks:
3030
# Run the linter
3131
- id: ruff
@@ -48,3 +48,10 @@ repos:
4848
additional_dependencies:
4949
# Support pyproject.toml configuration
5050
- tomli
51+
- repo: https://github.com/pre-commit/mirrors-mypy
52+
rev: v1.18.2
53+
hooks:
54+
- id: mypy
55+
args: [--ignore-missing-imports]
56+
files: ^causalpy/
57+
additional_dependencies: [numpy>=1.20, pandas-stubs]

AGENTS.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# AGENTS
2+
3+
## Testing preferences
4+
5+
- Write all Python tests as `pytest` style functions, not unittest classes
6+
- Use descriptive function names starting with `test_`
7+
- Prefer fixtures over setup/teardown methods
8+
- Use assert statements directly, not self.assertEqual
9+
10+
## Testing approach
11+
12+
- Never create throwaway test scripts or ad hoc verification files
13+
- If you need to test functionality, write a proper test in the test suite
14+
- All tests go in the `causalpy/tests/` directory following the project structure
15+
- Tests should be runnable with the rest of the suite (`python -m pytest`)
16+
- Even for quick verification, write it as a real test that provides ongoing value
17+
- Preference should be given to integration tests, but unit tests are acceptable for core functionality to maintain high code coverage.
18+
- Tests should remain quick to run. Tests involving MCMC sampling with PyMC should use custom `sample_kwargs` to minimize the computational load.
19+
20+
## Documentation
21+
22+
- **Structure**: Notebooks (how-to examples) go in `docs/source/notebooks/`, knowledgebase (educational content) goes in `docs/source/knowledgebase/`
23+
- **Notebook naming**: Use pattern `{method}_{model}.ipynb` (e.g., `did_pymc.ipynb`, `rd_skl.ipynb`), organized by causal method
24+
- **MyST directives**: Use `:::{note}` and other MyST features for callouts and formatting
25+
- **Glossary linking**: Link to glossary terms (defined in `glossary.rst`) on first mention in a file:
26+
- In Markdown files (`.md`, `.ipynb`): Use MyST syntax `{term}glossary term``
27+
- In RST files (`.rst`): Use Sphinx syntax `:term:`glossary term``
28+
- **Cross-references**: For other cross-references in Markdown files, use MyST role syntax with curly braces (e.g., `{doc}path/to/doc`, `{ref}label-name`)
29+
- **Citations**: Use `references.bib` for citations, cite sources in example notebooks where possible. Include reference section at bottom of notebooks using `:::{bibliography}` directive with `:filter: docname in docnames`
30+
- **API documentation**: Auto-generated from docstrings via Sphinx autodoc, no manual API docs needed
31+
- **Build**: Use `make html` to build documentation
32+
- **Doctest**: Use `make doctest` to test that Python examples in doctests work
33+
34+
## Code structure and style
35+
36+
- **Experiment classes**: All experiment classes inherit from `BaseExperiment` in `causalpy/experiments/`. Must declare `supports_ols` and `supports_bayes` class attributes. Only implement abstract methods for supported model types (e.g., if only Bayesian is supported, implement `_bayesian_plot()` and `get_plot_data_bayesian()`; if only OLS is supported, implement `_ols_plot()` and `get_plot_data_ols()`)
37+
- **Model-agnostic design**: Experiment classes should work with both PyMC and scikit-learn models. Use `isinstance(self.model, PyMCModel)` vs `isinstance(self.model, RegressorMixin)` to dispatch to appropriate implementations
38+
- **Model classes**: PyMC models inherit from `PyMCModel` (extends `pm.Model`). Scikit-learn models use `RegressorMixin` and are made compatible via `create_causalpy_compatible_class()`. Common interface: `fit()`, `predict()`, `score()`, `calculate_impact()`, `print_coefficients()`
39+
- **Data handling**: PyMC models use `xarray.DataArray` with coords (keys like "coeffs", "obs_ind", "treated_units"). Scikit-learn models use numpy arrays. Data index should be named "obs_ind"
40+
- **Formulas**: Use patsy for formula parsing (via `dmatrices()`)
41+
- **Custom exceptions**: Use project-specific exceptions from `causalpy.custom_exceptions`: `FormulaException`, `DataException`, `BadIndexException`
42+
- **File organization**: Experiments in `causalpy/experiments/`, PyMC models in `causalpy/pymc_models.py`, scikit-learn models in `causalpy/skl_models.py`
43+
44+
## Type Checking
45+
46+
- **Tool**: MyPy
47+
- **Configuration**: Integrated as a pre-commit hook.
48+
- **Scope**: Checks Python files within the `causalpy/` directory.
49+
- **Settings**:
50+
- `ignore-missing-imports`: Enabled to allow for gradual adoption of type hints without requiring all third-party libraries to have stubs.
51+
- `additional_dependencies`: Includes `numpy` and `pandas-stubs` to provide type information for these libraries.
52+
- **Execution**: Run automatically via `pre-commit run --all-files` or on commit.

CONTRIBUTING.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ We appreciate being notified of problems with the existing CausalPy code. We pre
1717

1818
Please verify that your issue is not being currently addressed by other issues or pull requests by using the GitHub search tool to look for key words in the project issue tracker.
1919

20+
## Use of agents
21+
PR's with agent-generated code are fine. But don't spam us with code you don't understand. See [AGENTS.md](./AGENTS.md) for how we use LLMs in this repo.
22+
2023
## Contributing code via pull requests
2124

2225
While issue reporting is valuable, we strongly encourage users who are inclined to do so to submit patches for new or existing issues via pull requests. This is particularly the case for simple fixes, such as typos or tweaks to documentation, which do not require a heavy investment of time and attention.

Makefile

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,50 @@
1-
.PHONY: init lint check_lint test uml html cleandocs doctest
1+
#################################################################################
2+
# GLOBALS #
3+
#################################################################################
24

3-
init:
5+
PACKAGE_DIR = causalpy
6+
7+
#################################################################################
8+
# COMMANDS #
9+
#################################################################################
10+
11+
.PHONY: init lint check_lint test uml html cleandocs doctest help
12+
13+
init: ## Install the package in editable mode
414
python -m pip install -e . --no-deps
515

6-
lint:
16+
lint: ## Run ruff linter and formatter
717
ruff check --fix .
818
ruff format .
919

10-
check_lint:
20+
check_lint: ## Check code formatting and linting without making changes
1121
ruff check .
1222
ruff format --diff --check .
1323
interrogate .
1424

15-
doctest:
25+
doctest: ## Run doctests for the causalpy module
1626
python -m pytest --doctest-modules --ignore=causalpy/tests/ causalpy/ --config-file=causalpy/tests/conftest.py
1727

18-
test:
28+
test: ## Run all tests with pytest
1929
python -m pytest
2030

21-
uml:
31+
uml: ## Generate UML diagrams from code
2232
pyreverse -o png causalpy --output-directory docs/source/_static --ignore tests
2333

24-
# Docs build commands
25-
26-
html:
34+
html: ## Build HTML documentation with Sphinx
2735
sphinx-build -b html docs/source docs/_build
2836

29-
cleandocs:
37+
cleandocs: ## Clean the documentation build directories
3038
rm -rf docs/_build
3139
rm -rf docs/source/api/generated
40+
41+
42+
#################################################################################
43+
# Self Documenting Commands #
44+
#################################################################################
45+
46+
.DEFAULT_GOAL := help
47+
48+
help: ## Show this help message
49+
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
50+
awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'

causalpy/data/datasets.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,28 @@
4343
}
4444

4545

46-
def _get_data_home() -> pathlib.PosixPath:
46+
def _get_data_home() -> pathlib.Path:
4747
"""Return the path of the data directory"""
4848
return pathlib.Path(cp.__file__).parents[1] / "causalpy" / "data"
4949

5050

51-
def load_data(dataset: str = None) -> pd.DataFrame:
52-
"""Loads the requested dataset and returns a pandas DataFrame.
51+
def load_data(dataset: str | None = None) -> pd.DataFrame:
52+
"""Load the requested dataset and return a pandas DataFrame.
5353
54-
:param dataset: The desired dataset to load
54+
Parameters
55+
----------
56+
dataset : str, optional
57+
The desired dataset to load. If None, raises ValueError.
58+
59+
Returns
60+
-------
61+
pd.DataFrame
62+
The loaded dataset as a pandas DataFrame.
63+
64+
Raises
65+
------
66+
ValueError
67+
If the requested dataset is not found.
5568
"""
5669

5770
if dataset in DATASETS:

0 commit comments

Comments
 (0)