-
Notifications
You must be signed in to change notification settings - Fork 5
Joss paper #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Joss paper #296
Changes from 11 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
1d3a1b2
paper draft
jmafoster1 f726d6d
added citations
jmafoster1 71503ea
added bob as an author
jmafoster1 5c2dcc4
Merge branch 'main' into joss-paper
jmafoster1 c07d175
Added Bob's ORCID
jmafoster1 ba1778f
Merge branch 'joss-paper' of github.com:CITCOM-project/CausalTestingF…
jmafoster1 942aa90
added compilation workflow
jmafoster1 772d5c9
Moved paper into a directory
jmafoster1 85a1da3
Fixed citations
jmafoster1 be44f5d
Added related work and ongoing/future work sections
jmafoster1 7c9e05f
expanded tosem
jmafoster1 25d6914
Integrated feedback from @f-alian and @AndrewC19
jmafoster1 97ae1bf
Merge branch 'main' into joss-paper
jmafoster1 4c1410b
Nick and Neil's comments
jmafoster1 8a8ba15
Merge branch 'main' into joss-paper
jmafoster1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
name: JOSS article compilation | ||
on: | ||
push: | ||
paths: | ||
- paper/** | ||
- images/schematic.png | ||
- .github/workflows/draft-pdf.yml | ||
|
||
jobs: | ||
paper: | ||
runs-on: ubuntu-latest | ||
name: Paper Draft | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Build draft PDF | ||
uses: openjournals/openjournals-draft-action@master | ||
with: | ||
journal: joss | ||
# This should be the path to the paper within your repo. | ||
paper-path: paper/paper.md | ||
- name: Upload | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: paper | ||
# This is the output path where Pandoc will write the compiled | ||
# PDF. Note, this should be the same directory as the input | ||
# paper.md | ||
path: paper/paper.pdf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
@article{blobaum2024dowhy, | ||
author = {Patrick Bl{{\"o}}baum and Peter G{{\"o}}tz and Kailash Budhathoki and Atalanti A. Mastakouri and Dominik Janzing}, | ||
journal = {Journal of Machine Learning Research}, | ||
number = {147}, | ||
pages = {1--7}, | ||
title = {DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models}, | ||
volume = {25}, | ||
year = {2024} | ||
} | ||
|
||
@techreport{chen1998metamorphic, | ||
author = {Chen, Tsong Y. and Cheung, Shing C. and Yiu, Shiu Ming}, | ||
institution = { The Hong Kong University of Science and Technology}, | ||
number = {HKUST-CS98-01}, | ||
title = {Metamorphic testing: A new approach for generating next test cases}, | ||
year = {1998} | ||
} | ||
|
||
@inproceedings{clark2023metamorphic, | ||
author = {Clark, Andrew G. and Foster, Michael and Walkinshaw, Neil and Hierons, Robert M.}, | ||
booktitle = {2023 IEEE Conference on Software Testing, Verification and Validation (ICST)}, | ||
doi = {10.1109/ICST57152.2023.00023}, | ||
keywords = {Software testing;Java;Graphical models;Computer bugs;Software;Test pattern generators;Usability;Metamorphic testing;Causality;DAGs}, | ||
number = {}, | ||
pages = {153-164}, | ||
title = {Metamorphic Testing with Causal Graphs}, | ||
volume = {}, | ||
year = {2023} | ||
} | ||
|
||
@article{clark2023testing, | ||
address = {New York, NY, USA}, | ||
articleno = {10}, | ||
author = {Clark, Andrew G. and Foster, Michael and Prifling, Benedikt and Walkinshaw, Neil and Hierons, Robert M. and Schmidt, Volker and Turner, Robert D.}, | ||
doi = {10.1145/3607184}, | ||
issn = {1049-331X}, | ||
issue_date = {January 2024}, | ||
journal = {ACM Transactions on Software Engineering Methodology}, | ||
month = {nov}, | ||
number = {1}, | ||
numpages = {42}, | ||
publisher = {Association for Computing Machinery}, | ||
title = {Testing Causality in Scientific Modelling Software}, | ||
volume = {33}, | ||
year = {2023} | ||
} | ||
|
||
@inproceedings{foster2024adequacy, | ||
author = {Foster, Michael and Wild, Christopher and Hierons, Robert M. and Walkinshaw, Neil}, | ||
booktitle = {2024 IEEE Conference on Software Testing, Verification and Validation (ICST)}, | ||
doi = {10.1109/ICST60714.2024.00023}, | ||
number = {}, | ||
pages = {161-172}, | ||
title = {Causal Test Adequacy}, | ||
volume = {}, | ||
year = {2024} | ||
} | ||
|
||
@inproceedings{guderlei2007smt, | ||
author = {Guderlei, Ralph and Mayer, Johannes}, | ||
booktitle = {Seventh International Conference on Quality Software (QSIC 2007)}, | ||
doi = {10.1109/QSIC.2007.4385527}, | ||
number = {}, | ||
pages = {404-409}, | ||
title = {Statistical Metamorphic Testing Testing Programs with Random Output by Means of Statistical Hypothesis Tests and Metamorphic Testing}, | ||
volume = {}, | ||
year = {2007} | ||
} | ||
|
||
@misc{sharma2020dowhy, | ||
archiveprefix = {arXiv}, | ||
author = {Amit Sharma and Emre Kiciman}, | ||
eprint = {2011.04216}, | ||
primaryclass = {stat.ME}, | ||
title = {DoWhy: An End-to-End Library for Causal Inference}, | ||
url = {https://arxiv.org/abs/2011.04216}, | ||
year = {2020} | ||
} | ||
|
||
@article{somers2024configuration, | ||
doi = {10.2139/ssrn.4732706}, | ||
author = {Somers, Richard and Walkinshaw, Neil and Hierons, Robert and Elliott, Jackie and Iqbal, Ahmed and Walkinshaw, Emma}, | ||
publisher = {Elsevier BV}, | ||
title = {Configuration Testing of an Artificial Pancreas System Using a Digital Twin}, | ||
year = {2024} | ||
} | ||
|
||
@article{textor2017dagitty, | ||
doi = {10.1093/ije/dyw341}, | ||
issn = {1464-3685}, | ||
author = {Textor, Johannes and van der Zander, Benito and Gilthorpe, Mark S. and Liśkiewicz, Maciej and Ellison, George T.H.}, | ||
journal = {International Journal of Epidemiology}, | ||
month = {jan}, | ||
pages = {dyw341}, | ||
publisher = {Oxford University Press (OUP)}, | ||
title = {Robust causal inference using directed acyclic graphs: the R package ‘dagitty’}, | ||
year = {2017} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
--- | ||
title: 'The Causal Testing Framework' | ||
tags: | ||
- Python | ||
- causal testing | ||
- causal inference | ||
- causality | ||
- software testing | ||
- metamorphic testing | ||
authors: | ||
- name: Michael Foster | ||
orcid: 0000-0001-8233-9873 | ||
affiliation: 1 | ||
corresponding: true | ||
- name: Andrew Clark | ||
orcid: 0000-0002-6830-0566 | ||
affiliation: 1 | ||
- name: Christopher Wild | ||
orcid: 0009-0009-1195-1497 | ||
affiliation: 1 | ||
- name: Farhad Allian | ||
orcid: 0000-0002-4569-0370 | ||
affiliation: 1 | ||
- name: Robert Turner | ||
orcid: 0000-0002-1353-1404 | ||
affiliation: 1 | ||
- name: Richard Somers | ||
orcid: 0009-0009-1195-1497 | ||
affiliation: 1 | ||
- name: Nicholas Lattimer | ||
orcid: 0000-0001-5304-5585 | ||
affiliation: 1 | ||
- name: Neil Walkinshaw | ||
orcid: 0000-0003-2134-6548 | ||
affiliation: 1 | ||
- name: Rob Hierons | ||
orcid: 0000-0003-2134-6548 | ||
affiliation: 1 | ||
affiliations: | ||
- name: University of Sheffield, UK | ||
index: 1 | ||
date: 2 December 2024 | ||
bibliography: paper.bib | ||
--- | ||
|
||
# Summary | ||
Scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical. | ||
In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse observational data instead of costly randomised trials. | ||
Causal Inference works by using domain knowledge to identify and mitigate for biases in the data, enabling them to answer causal questions that concern the effect of changing some feature on the observed outcome. | ||
The Causal Testing Framework is a software testing framework that uses Causal Inference techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing. | ||
|
||
# Statement of need | ||
Metamorphic Testing [@chen1998metamorphic] is a popular technique for testing computational models (and other traditionally "hard to test" software). | ||
Test goals are expressed as _metamorphic relations_ that specify how changing an input in a particular way should affect the software output. | ||
Nondeterministic software can be tested using Statistical Metamorphic Testing [@guderlei2007smt], which uses statistical tests over multiple executions of the software to determine whether the specified metamorphic relations hold. | ||
However, this requires the software to be executed repeatedly for each set of parameters of interest, so is computationally expensive, and is constrained to testing properties over software inputs that can be directly and precisely controlled. | ||
Statistical Metamorphic Testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori. | ||
|
||
By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the Causal Testing Framework circumvents both of these problems by enabling models to be tested using pre-existing runtime data. | ||
The Causal Testing Framework is written in python but is language agnostic in terms of the system under test. | ||
jmafoster1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
All that is required is a set of properties to be validated, a causal model, and a set of software runtime data. | ||
|
||
# Causal Testing | ||
Causal Testing [@clark2023testing] has four main steps, outlined in \ref{fig:schematic}. | ||
Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) in which an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$. | ||
Secondly, the user supplies a set of causal properties to be tested. | ||
Such properties can be generated from the causal DAG [@clark2023metamorphic]: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated. | ||
The user may also refine tests to validate the nature of a particular relationship. | ||
Next, the user supplies a set of runtime data in the form of a table with each column representing a variable and rows containing the value of each variable for a particular run of the software. | ||
Finally, the Causal Testing Framework automatically validates the supplied causal properties by using the supplied causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship. | ||
jmafoster1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
 | ||
|
||
## Test Adequacy | ||
Because the properties being tested are completely separate from the data used to validate them, traditional coverage-based metrics are not appropriate here. | ||
The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric [@foster2024adequacy] based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data. | ||
|
||
## Missing Variables | ||
Causal Testing works by using the supplied causal DAG to identify those variables which need to be statistically controlled for to remove their biassing effect on the causal estimate. | ||
This typically means we need to know their values. | ||
However, the Causal Testing Framework can still sometimes estimate unbiased causal effects using Instrumental Variables, an advanced Causal Inference technique. | ||
jmafoster1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Feedback Over Time | ||
Many scientific models involve iterating several interacting processes over time. | ||
These processes often feed into each other, and can create feedback cycles. | ||
Traditional Causal Inference cannot handle this, however the Causal Testing Framework uses another advanced Causal Inference technique, g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables. | ||
jmafoster1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Related Work | ||
The Dagitty tool [@textor2017dagitty] is a browser-based environment for creating, editing, and analysing causal graphs. | ||
There is an R package for local use, but the tool does not aim to facilitate causal inference. | ||
For this, the doWhy [@sharma2020dowhy; @blobaum2024dowhy] is a python package which can be used to estimate causal effects from data. | ||
However, the package is intended for general causal inference. | ||
It does not explicitly support causal testing, nor does it support temporal feedback loops. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw that another new end-to-end causal inference framework was launched the other day: https://cstructure.dev/ Might not be relevant here but thought it's worth flagging to the CITCoM team. |
||
|
||
# Ongoing and Future Research | ||
The Causal Testing Framework is the subject of several publications [@clark2023metamorphic; @clark2023testing; @foster2024adequacy; @somers2024configuration]. | ||
We are also in the process of preparing scientific publications concerning how the Causal Testing Framework handles missing variables and feedback over time. | ||
Furthermore, we are working to develop a plug-in for the [DAFNI framework](https://www.dafni.ac.uk/) to enable national-scale infrastructure models to be easily tested. | ||
jmafoster1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Acknowledgements | ||
This work was supported by the EPSRC CITCoM grant EP/T030526/1. | ||
|
||
# References |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.