Removed all mention of data collection and json front end

jmafoster1 · jmafoster1 · commit b462aa1635bb · 2025-02-27T09:41:22.000Z
diff --git a/causal_testing/__init__.py b/causal_testing/__init__.py
@@ -1,11 +1,11 @@
 """
 This is the CausalTestingFramework Module
 It contains 5 subpackages:
-data_collection
-generation
-json_front
+estimation
 specification
+surrogate
 testing
+utils
 """
 
 import logging
diff --git a/causal_testing/surrogate/causal_surrogate_assisted.py b/causal_testing/surrogate/causal_surrogate_assisted.py
@@ -78,11 +78,11 @@ def execute(
     ):
         """For this specific test case, a search algorithm is used to find the most contradictory point in the input
         space which is, therefore, most likely to indicate incorrect behaviour. This cadidate test case is run against
-        the simulator, checked for faults and the result returned with collected data
+        the simulator, checked for faults and the result returned.
         :param df: An dataframe which contains data relevant to the specified scenario
         :param max_executions: Maximum number of simulator executions before exiting the search
         :param custom_data_aggregator:
-        :return: tuple containing SimulationResult or str, execution number and collected data"""
+        :return: tuple containing SimulationResult or str, execution number and dataframe"""
 
         for i in range(max_executions):
             surrogate_models = self.generate_surrogates(self.specification, df)
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -119,14 +119,6 @@ system-under-test that is expected to cause a change to some output(s).
 
    /autoapi/index
 
-.. toctree::
-   :hidden:
-   :maxdepth: 1
-   :caption: Front Ends
-
-   frontends/json_front_end
-   frontends/test_suite
-
 .. toctree::
    :hidden:
    :maxdepth: 1
diff --git a/docs/source/modules/causal_tests.rst b/docs/source/modules/causal_tests.rst
@@ -2,7 +2,7 @@
 Causal Testing
 ==============
 
-This package contains the main components of the causal testing framework, causal tests and causal oracles, which utilise both the specification and data collection packages.
+This package contains the main components of the causal testing framework, causal tests and causal oracles, which utilise the specification package.
 
 - A causal test case is a triple ``(X, \Delta, Y)`` where ``X`` is an input configuration, ``\Delta`` is an intervention, and ``Y`` is the expected causal effect of applying ``\Delta`` to ``X``. Put simply, a causal test case states the expected change in an outcome that applying an intervention to X should cause. In this context, an intervention is simply a function which manipulates the input configuration of the scenario-under-test in a way that is expected to cause a change to some outcome.
 
@@ -44,12 +44,12 @@ We then define a number of causal test cases to apply to the scenario-under-test
 
 - To run these test cases experimentally, we need to execute both ``X`` and ``\Delta(X)`` - that is, with and without the interventions. Since the only difference between these test cases is the intervention, we can conclude that the observed difference in ``n_infected_t5`` was caused by the interventions. While this is the simplest approach, it can be extremely inefficient at scale, particularly when dealing with complex software such as computational models.
 
-- To run these test cases observationally, we need to collect *valid* observational data for the scenario-under-test. This means we can only use executions with between 20 and 30 people, a square environment of size betwen 20x20 and 40x40, and where a single person was initially infected. In addition, this data must contain executions both with and without the intervention. Next, we need to identify any sources of bias in this data and determine a procedure to counteract them. This is achieved automatically using graphical causal inference techniques that identify a set of variables that can be adjusted to obtain a causal estimate. Finally, for any categorical biasing variables, we need to make sure we have executions corresponding to each category otherwise we have a positivity violation (i.e. missing data). In the worst case, this at least guides the user to an area of the system-under-test that should be executed.
+- To run these test cases observationally, we need *valid* observational data for the scenario-under-test. This means we can only use executions with between 20 and 30 people, a square environment of size betwen 20x20 and 40x40, and where a single person was initially infected. In addition, this data must contain executions both with and without the intervention. Next, we need to identify any sources of bias in this data and determine a procedure to counteract them. This is achieved automatically using graphical causal inference techniques that identify a set of variables that can be adjusted to obtain a causal estimate. Finally, for any categorical biasing variables, we need to make sure we have executions corresponding to each category otherwise we have a positivity violation (i.e. missing data). In the worst case, this at least guides the user to an area of the system-under-test that should be executed.
 
 Causal Inference
 ----------------
 
-- After collecting either observational or experimental data, we now need to apply causal inference. First, as described above, we use our causal graph to identify a set of adjustment variables which mitigate all bias in the data. Next, we use statistical models to adjust for these variables (implementing the statistical procedure necessary to isolate the causal effect) and obtain the desired causal estimate. Depending on the statistical model used, we can also generate 95% confidence intervals (or confidence intervals at any confidence level for that matter).
+- After obtaining suitable test data, we now need to apply causal inference. First, as described above, we use our causal graph to identify a set of adjustment variables which mitigate all bias in the data. Next, we use statistical models to adjust for these variables (implementing the statistical procedure necessary to isolate the causal effect) and obtain the desired causal estimate. Depending on the statistical model used, we can also generate 95% confidence intervals (or confidence intervals at any confidence level for that matter).
 
 - In our example, the causal DAG tell us it is necessary to adjust for ``environment`` in order to obtain the causal effect of ``precaution`` on ``n_infected_t5``. Supposing the relationship is linear, we can employ a linear regression model of the form ``n_infected_t5 ~ p0*precaution + p1*environment`` to carry out this adjustment. If we use experimental data, only a single environment is used by design and therefore the adjustment has no impact. However, if we use observational data, the environment may vary and therefore this adjustment will look at the causal effect within different environments and then provide a weighted average, which turns out to be the partial coefficient ``p0``.
 
diff --git a/docs/source/usage.rst b/docs/source/usage.rst
@@ -2,11 +2,7 @@
 Usage
 -----
 
-There are currently 3 methods of using the Causal Testing Framework; 1) :doc:`JSON Front End </frontends/json_front_end>`\, 2)
-:doc:`Test Suites </frontends/test_suite>`\, or 3) directly as
-described below.
-
-The causal testing framework is made up of 3 main components: Specification, Testing, and Data Collection. The first
+The causal testing framework is made up of 2 main components: Specification and Testing. The first
 step is to specify the (part of the) system under test as a modelling ``Scenario``. Modelling scenarios specify the
 observable variables and any constraints which exist between them. We currently support 3 types of variable:
 
diff --git a/examples/covasim_/doubling_beta/README.md b/examples/covasim_/doubling_beta/README.md
@@ -1,6 +1,6 @@
 # Covasim Case Study: Doubling Beta (Infectiousness)
-In this case study, we demonstrate how to use the causal testing framework with observational
-data collected Covasim to conduct Statistical Metamorphic Testing (SMT) a posteriori. Here, we focus on a set of simple
+In this case study, we demonstrate how to use the causal testing framework with observational data from
+Covasim to conduct Statistical Metamorphic Testing (SMT) a posteriori. Here, we focus on a set of simple
 modelling scenarios that investigate how the infectiousness of the virus (encoded as the parameter beta) affects the
 cumulative number of infections over a fixed duration. We also run several causal tests that focus on increasingly
 specific causal questions pertaining to more refined metamorphic properties and enabling us to learn more about the
diff --git a/examples/covasim_/vaccinating_elderly/README.md b/examples/covasim_/vaccinating_elderly/README.md
@@ -15,8 +15,7 @@ Further details are provided in Section 5.3 (Prioritising the elderly for vaccin
 
 >[!NOTE]
 >This version of the CTF uses observational data to separate the software execution and testing.
-Older versions of this framework simulate the data using a custom experimental data collector and the `covasim`
-package (version 3.0.7) as outlined below.
+Older versions of this framework directly run the `covasim` package (version 3.0.7) as outlined below.
 
 ## How to run
 To run this case study:
diff --git a/examples/poisson-line-process/README.md b/examples/poisson-line-process/README.md
@@ -7,6 +7,5 @@ To run this case study:
    (instructions are provided in the project README).
 2. Change directory to `causal_testing/examples/poisson-line-process`.
 3. Run the command `python example_pure_python.py` to demonstrate causal testing using pure python.
-3. Run the command `python example_json_frontend.py` to demonstrate the same causal tests using JSON.
 
 This should print a series of causal test results and produce two CSV files. `intensity_num_shapes_results_random_1000.csv` corresponds to table 1, and `width_num_shapes_results_random_1000.csv` relates to our findings regarding the relationship of width and `P_u`.