Skip to content

Commit 5204c85

Browse files
authored
Merge pull request #84 from CITCOM-project/tidying
Odd jobs to clean up package
2 parents cb3c93e + f3c3a24 commit 5204c85

File tree

3 files changed

+43
-37
lines changed

3 files changed

+43
-37
lines changed

README.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -22,28 +22,30 @@ The causal testing framework has three core components:
2222
For more information on each of these steps, follow the link to their respective documentation.
2323

2424
## Causal Inference Terminology
25+
2526
Here are some explanations for the causal inference terminology used above.
2627

2728
- Causal inference (CI) is a family of statistical techniques designed to quantify and establish **causal** relationships in data. In contrast to purely statistical techniques that are driven by associations in data, CI incorporates knowledge about the data-generating mechanisms behind relationships in data to derive causal conclusions.
2829
- One of the key advantages of CI is that it is possible to answer causal questions using **observational data**. That is, data which has been passively observed rather than collected from an experiment and, therefore, may contain all kinds of bias. In a testing context, we would like to leverage this advantage to test causal relationships in software without having to run costly experiments.
2930
- There are many forms of CI techniques with slightly different aims, but in this framework we focus on graphical CI techniques that use directed acyclic graphs to obtain causal estimates. These approaches used a causal DAG to explain the causal relationships that exist in data and, based on the structure of this graph, design statistical experiments capable of estimating the causal effect of a particular intervention or action, such as taking a drug or changing the value of an input variable.
3031

31-
# Installation
32+
## Installation
3233

3334
To use the causal testing framework, clone the repository, `cd` into the root directory, and run `pip install -e .`. More detailled installation instructions can be found in the [online documentation](https://causal-testing-framework.readthedocs.io/en/latest/installation.html).
3435

35-
36-
# Usage
36+
## Usage
3737

3838
There are currently two methods of using the Causal Testing Framework, through the [JSON Front End](https://causal-testing-framework.readthedocs.io/en/latest/json_front_end.html) or directly as described below.
3939

4040
The causal testing framework is made up of three main components: Specification, Testing, and Data Collection. The first step is to specify the (part of the) system under test as a modelling `Scenario`. Modelling scenarios specify the observable variables and any constraints which exist between them. We currently support three types of variable:
41-
- `Input` variables are input parameters to the system.
42-
- `Output` variables are outputs from the system.
43-
- `Meta` variables are not directly observable but are relevant to system testing, e.g. a model may take a `location` parameter and expand this out into `average_age` and `household_size` variables "under the hood". These parameters can be made explicit by instantiating them as metavariables.
41+
42+
- `Input` variables are input parameters to the system.
43+
- `Output` variables are outputs from the system.
44+
- `Meta` variables are not directly observable but are relevant to system testing, e.g. a model may take a `location` parameter and expand this out into `average_age` and `household_size` variables "under the hood". These parameters can be made explicit by instantiating them as metavariables.
4445

4546
To instantiate a scenario, simply provide a set of variables and an optional set of constraints, e.g.
46-
```
47+
48+
```{python}
4749
from causal_testing.specification.variable import Input, Output, Meta
4850
from causal_testing.specification.scenario import Scenario
4951
@@ -57,7 +59,8 @@ modelling_scenario = Scenario({x, y, z}, {x > z, z < 3}) # Define a scenario wi
5759
Note that scenario constraints are primarily intended to help specify the region of the input space under test in a manner consistent with the Category Partition Method. It is not intended to serve as a test oracle. Use constraints sparingly and with caution to avoid introducing data selection bias. We use Z3 to handle constraints. For help with this, check out [their documentation](https://ericpony.github.io/z3py-tutorial/guide-examples.htm).
5860

5961
Having fully specified the modelling scenario, we are now ready to test. Causal tests are, essentially [metamorphic tests](https://en.wikipedia.org/wiki/Metamorphic_testing) which are executed using statistical causal inference. A causal test expresses the change in a given output that we expect to see when we change a particular input in a particular way, e.g.
60-
```
62+
63+
```{python}
6164
from causal_testing.testing.causal_test_case import CausalTestCase
6265
from causal_testing.testing.causal_test_outcome import Positive
6366
@@ -72,15 +75,15 @@ causal_test_case = CausalTestCase(
7275
Before we can run our test case, we first need data. There are two ways to acquire this: 1. run the model with the specific input configurations we're interested in, 2. use data from previous model runs. For a small number of specific tests where accuracy is critical, the first approach will yield the best results. To do this, you need to instantiate the `ExperimentalDataCollector` class.
7376

7477
Where there are many test cases using pre-existing data is likely to be faster. If the program's behaviour can be estimated statistically, the results should still be reliable as long as there is enough data for the estimator to work as intended. This will vary depending on the program and the estimator. To use this method, simply instantiate the `ObservationalDataCollector` class with the modelling scenario and a path to the CSV file containing the runtime data, e.g.
75-
```
78+
79+
```{python}
7680
data_csv_path = 'results/data.csv'
7781
data_collector = ObservationalDataCollector(modelling_scenario, data_csv_path)
7882
```
7983

80-
8184
The actual running of the tests is done using the `CausalTestEngine` class. This is still a work in progress and may change in the future to improve ease of use, but currently proceeds as follows.
8285

83-
```
86+
```{python}
8487
causal_test_engine = CausalTestEngine(causal_test_case, causal_specification, data_collector) # Instantiate the causal test engine
8588
minimal_adjustment_set = causal_test_engine.load_data(data_csv_path, index_col=0) # Calculate the adjustment set
8689
treatment_vars = list(causal_test_case.treatment_input_configuration)
@@ -89,14 +92,14 @@ minimal_adjustment_set = minimal_adjustment_set - set([v.name for v in treatment
8992

9093
Whether using fresh or pre-existing data, a key aspect of causal inference is estimation. To actually execute a test, we need an estimator. We currently support two estimators: linear regression and causal forest. These can simply be instantiated as per the [documentation](https://causal-testing-framework.readthedocs.io/en/latest/autoapi/causal_testing/testing/estimators/index.html).
9194

92-
```
95+
```{python}
9396
from causal_testing.testing.estimators import LinearRegressionEstimator
9497
estimation_model = LinearRegressionEstimator("x",), 0, 1, minimal_adjustment_set, ("y",), causal_test_engine.scenario_execution_data_df)
9598
```
9699

97100
We can now execute the test using the estimation model. This returns a causal test result, from which we can extract various information. Here, we simply assert that the observed result is (on average) what we expect to see.
98101

99-
```
102+
```{python}
100103
causal_test_result = causal_test_engine.execute_test(estimation_model)
101104
test_passes = causal_test_case.expected_causal_effect.apply(causal_test_result)
102105
assert test_passes, "Expected to see a positive change in y."

causal_testing/pyflakes.sh

Lines changed: 0 additions & 3 deletions
This file was deleted.

setup.py

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,41 @@
11
from setuptools import setup, find_packages
22

33
requirements = [
4-
"numpy~=1.20",
5-
"pandas~=1.3.4",
6-
"setuptools~=58.5.3",
7-
"networkx~=2.6.3",
8-
"pygraphviz~=1.7",
9-
"scikit-learn~=1.0.1",
10-
"matplotlib~=3.5.0",
114
"econml~=0.12.0",
12-
"statsmodels~=0.13.1",
13-
"z3-solver~=4.8.13.0",
14-
"lhsmdu",
15-
"tabulate",
16-
"scipy~=1.7.2",
17-
"fitter~=1.4"
5+
"fitter~=1.4.0",
6+
"lhsmdu~=1.1",
7+
"networkx~=2.6.3",
8+
"numpy~=1.22.4",
9+
"pandas~=1.3.5",
10+
"scikit_learn~=1.1.2",
11+
"scipy~=1.7.3",
12+
"statsmodels~=0.13.2",
13+
"tabulate~=0.8.10",
14+
"z3_solver~=4.8.13.0",
1815
]
1916

2017
# Additional dependencies for development
21-
dev_requirements = [
22-
"autopep8",
23-
"isort",
24-
"pytest",
25-
"pylint",
26-
"black"
27-
]
18+
dev_requirements = ["autopep8", "isort", "pytest", "pylint", "black"]
19+
20+
readme = open("README.md", encoding = "UTF-8").read()
2821

2922
setup(
3023
name="causal_testing_framework",
3124
version="0.0.1",
25+
description="A framework for causal testing using causal directed acyclic graphs.",
26+
long_description=readme,
27+
long_description_content_type="text/markdown",
28+
author="The CITCOM team",
29+
url="https://github.com/CITCOM-project/CausalTestingFramework",
30+
project_urls={
31+
"Bug Tracker": "https://github.com/CITCOM-project/CausalTestingFramework/issues",
32+
"Documentation": "https://causal-testing-framework.readthedocs.io/",
33+
"Source": "https://github.com/CITCOM-project/CausalTestingFramework",
34+
},
35+
python_requires=">=3.9",
3236
install_requires=requirements,
3337
extras_require={"dev": dev_requirements},
34-
packages=find_packages()
38+
packages=find_packages(),
39+
license="MIT",
40+
keywords="causal inference, verification",
3541
)

0 commit comments

Comments
 (0)