You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+22-36Lines changed: 22 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,19 +47,19 @@ The Interfere package is designed to research *intervention response prediction*
47
47
Interfere offers the first steps towards this vision by combining (1) a general interface for simulating the effect of interventions on dynamic models, (2) a suite of predictive methods and cross validated hyper parameter optimization tools, and (3) the first known [extensible benchmark data set](https://drive.google.com/file/d/19_Ha-D8Kb1fFJ_iECU62eawbeuCpeV_g/view?usp=sharing) of dynamic intervention response scenarios see Figure \ref{fig:sixty_models}.
48
48
49
49
 for
56
-
intervention response prediction which is available online for download.
50
+
package. Simulated models are either differential equations or discrete time
51
+
difference equations. Trajectories in blue represent the natural behavior of
52
+
the system, while red depicts response to a specified intervention.
53
+
For models with more than three dimensions,
54
+
only the three dimensions with highest variance
55
+
are shown. These sixty scenarios, making up the [Interfere Benchmark 1.1.1](https://drive.google.com/file/d/19_Ha-D8Kb1fFJ_iECU62eawbeuCpeV_g/view?usp=sharing) for
56
+
intervention response prediction, are available for download.
![**Original System Trajectory (Left):** The natural, uninterupted evolution of
62
-
the quaratic Belozyorov system [@belozyorov_exponential_2015] simulated using
62
+
the quadratic Belozyorov system [@belozyorov_exponential_2015] simulated using
63
63
the Interfere package with a small
64
64
amount of stochastic noise.
65
65
**System Trajectory After Intervention (Center):** The
@@ -77,62 +77,48 @@ intervention and makes an attempt to predict the intervention response (red
77
77
curve).
78
78
](../images/interfere_usage_combined.png)
79
79
80
-
Over the past twenty years, the scientific community has experienced the emergence
81
-
of multiple frameworks for identifying causal relationships in observational
80
+
Over the past twenty years, multiple frameworks have emerged for identifying causal relationships in observational
82
81
data [@imbens_causal_2015; @pearl_causality_2009; @wieczorek_information_2019].
83
82
The most influential frameworks are probabilistic and, while it is not a necessary
84
-
condition for identifying causality, historically a static, linear relationship has
83
+
condition for identifying causality, a static, linear relationship has
85
84
often been assumed. However, when attempting to anticipate the response of complex dynamic
86
85
systems in the medium and long term, a linear approximation of the dynamics can be
87
-
insufficient. Therefore, researchers have increasingly begun to employ
88
-
non-linear, dynamic techniques for causal discovery and forecasting [e.g. @runge_discovering_2022]. Still,
86
+
insufficient. Therefore,
87
+
non-linear, dynamic techniques have been employed for causal discovery and forecasting [e.g. @runge_discovering_2022]. Nevertheless,
89
88
there are relatively few techniques that are able to fit causal dynamic
90
-
nonlinear models to data. Because of this, we see an opportunity to bring
91
-
together the insights from recent advancements in causal inference with
92
-
historical work in dynamic modeling and simulation.
89
+
nonlinear models to data.
93
90
94
-
In order to facilitate this cross pollination, we focus on a key problem --- predicting how a complex system responds to a previously unobserved intervention --- and designed the Interfere package for benchmarking tools aimed at intervention response prediction. The dynamic models contained in Interfere present challenges for computational methods that can likely only be addressed with the incorporation of mechanistic assumptions alongside probabilistic frameworks for causality. The Interfere package is a toolbox that allows researcher to validate predictive dynamic methods against simulated intervention scenarios. As such, the Interfere package encourages an opportunity for crosspollination between the probabilistic causal inference community and the modeling and simulation community.
91
+
Leveraging recent advancements in causal inference and historical work in dynamic modeling and simulation, we focus on a key problem --- predicting how a complex system responds to a previously unobserved intervention --- and designed the Interfere package for benchmarking tools aimed at this intervention response prediction. The dynamic models contained in Interfere present challenges for this prediction that likely require incorporating mechanistic assumptions alongside probabilistic frameworks for causality. Interfere allows researchers to validate predictive dynamic methods against simulated intervention scenarios, encouraging cross-pollination between probabilistic causal inference and modeling/simulation perspectives.
95
92
96
93
# Primary Contributions
97
94
98
-
The Interfere package provides three primary contributions. (1) Dynamically diverse counterfactuals at scale, (2) crossdisciplinary forecast methods, and (3) comprehensive and extensible benchmarking.
95
+
Interfere provides three primary contributions: (1) dynamically diverse counterfactuals, (2) cross-disciplinary forecast methods, and (3) comprehensive and extensible benchmarking.
99
96
100
97
![Example experimental setup possible with Interfere: Can stochasticity help reveal associations between variables? Interfere can be used to compare intervention response prediction for deterministic and stochastic versions of the same system.
## 1. Dynamically Diverse Counterfactuals at Scale
104
101
105
-
The "dynamics" submodule in the Interfere package contains over fifty dynamic models. It contains a mix of linear, nonlinear, chaotic, continuous time, discrete time, stochastic, and deterministic models. The models come from a variety of disciplines including finance, ecology, biology, neuroscience and public health. Each model inherits the from the Interfere BaseDynamics type and gains the ability to take exogenous control of any observed state and to add measurement noise. Most models also gain the ability to make any observed state stochastic where magnitude of stochasticity can be controlled by a simple scalar parameter or fine tuned with a covariance matrix.
106
-
107
-
Because of the difficulty of building models of complex systems, predictive methods for complex dynamics are typically benchmarked on less than ten dynamical systems [@challu_nhits_2023; @brunton_discovering_2016; @vlachas_backpropagation_2020; @pathak_model-free_2018; @prasse_predicting_2022]. As such, Interfere offers a clear improvement over current benchmarking methods for prediction in complex dynamics.
108
-
109
-
Most importantly, Interfere is built around interventions: the ability to take exogenous control of one or several state variables in a complex system and observe the response. Imbuing a suite of scientific models with general exogenous control is no small feat because models can be complex and are implemented in a variety of ways. Interfere offers the ability to produce complex dynamic intervention response and standard forecasting scenarios at scale. This unique feature enables large scale evaluation of dynamic causal prediction methods—tested against systems with properties of interest to scientists. For example, we can simulate the change in concentration of ammonia based on the nitrogen cycle and an exogenous fertilizing schedule.
102
+
Whereas most predictive methods for complex dynamics are typically benchmarked on fewer than ten systems [@challu_nhits_2023; @brunton_discovering_2016; @vlachas_backpropagation_2020; @pathak_model-free_2018; @prasse_predicting_2022], Interfere's "dynamics" submodule contains over fifty dynamic models, with a mix of linear, nonlinear, chaotic, continuous-time, discrete-time, stochastic, and deterministic models, from a variety of disciplines including finance, ecology, biology, neuroscience and public health. Most importantly, Interfere is built for studying interventions: each model inherits the Interfere BaseDynamics type, with possible exogenous control of any observed state, added measurement noise, and, for most models, stochasticity controlled by a scalar parameter or fine tuned with a covariance matrix. Interfere thus offers a user-friendly framework to produce complex dynamic intervention response and standard forecasting scenarios at scale.
110
103
111
104
## 2. Cross Disciplinary Forecast Methods
112
105
113
-
A second contribution of Interfere is the integration of dynamic *forecasting* methodologies from deep learning (LSTM, NHITS), applied mathematics (SINDy, Reservoir Computers) and social science (VAR). The Interfere "ForecastingMethod" class is expressive enough to describe, fit and predict with multivariate dynamic models and apply interventions to the states of the models during prediction. This cross disciplinary mix of techniques has the potential to produce new insights into the problem of intervention response prediction among others. For example, experiments using this package have revealed that cross validation error does not correlate with well with prediction error when LSTM and NHITS attempt to predict intervention response.
106
+
Interfere integrates dynamic *forecasting* methodologies from deep learning (LSTM, NHITS), applied mathematics (SINDy, Reservoir Computers) and social science (VAR). The Interfere "ForecastingMethod" class is expressive enough to describe, fit and predict with multivariate dynamic models and apply interventions to the states of the models during prediction.
114
107
115
108
## 3. Comprehensive and Extensible Benchmarking
116
109
117
-
The third major contribution of Interfere is the collection of dynamic scenarios organized into the [Interfere Benchmark](https://drive.google.com/file/d/19_Ha-D8Kb1fFJ_iECU62eawbeuCpeV_g/view?usp=sharing). The Interfere Benchmark is a comprehensive and extensible set of dynamic scenarios that are conveniently available for testing methods that predict the effects of interventions. The benchmark set contains 60 intervention response scenarios for testing, each simulated with different levels of stochastic noise. Each scenario is housed in a JSON file, complete with full metadata annotation, documentation, versioning and commit hashes marking the commit of Interfere that was used to generate the data. The scenarios were reviewed by hand with some systems exposed to exogenous input to ensure that none of the key variables settle into a steady state. Additionally, all interventions were chosen in a manner such that the response of the target variable is a significant departure from its previous behavior.
118
-
119
-
The Interfere package enables researchers from various backgrounds to systematically study the problem of predicting intervention response on simulated data from a wide range of disciplines. It thereby facilitates future progress towards correctly anticipating how complex systems will respond in new, never before seen scenarios.
110
+
Interfere organizes a variety of dynamic scenarios into the [Interfere Benchmark](https://drive.google.com/file/d/19_Ha-D8Kb1fFJ_iECU62eawbeuCpeV_g/view?usp=sharing), a comprehensive and extensible set containing 60 intervention response scenarios for testing, each simulated with different levels of stochastic noise. Each scenario is housed in a JSON file, with metadata annotation, documentation, versioning and commit hashes marking the commit of Interfere that was used to generate the data. The scenarios were reviewed by hand with some systems exposed to exogenous input to ensure that none of the key variables settle into a steady state. Additionally, all interventions were chosen so that the target variable response significantly departs from its prior behavior. We aim for this benchmark to facilitate future progress towards correctly anticipating how complex systems will respond to never before seen scenarios.
120
111
121
112
# Related Software and Mathematical Foundations
122
113
123
114
## Predictive Methods
124
115
125
-
The Interfere package draws from the Nixtla open source ecosystem for time series forecasting. We implemented intervention support for LSTM and NHITS from the NeuralForecast package, and for ARIMA from the StatsForecast package [@olivares2022library_neuralforecast; @garza2022statsforecast]. We followed Nixtla's example for cross validation and hyperparameter optimization approaches. We integrated predictive methods from the PySINDy [@kaptanoglu2022] and StatsModels [@seabold2010statsmodels] packages. We also include ResComp, a reservoir computing method for global forecasts from [@harding_global_2024]. Hyperparameter optimization is designed around the Optuna framework [@akiba2019optuna].
126
-
127
-
While other forecasting methods exist, integrating a method with Interfere requires that the method is capable of (1) multivariate endogenous dynamic forecasting, (2) support for exogenous variables, and (3) support for flexible length forecast windows or recursive predictions. Few forecasting methods meet these criteria, and it is our hope that this package can encourage the development of additional methods.
128
-
129
-
## Dynamic Models
130
-
131
-
The table below list the dynamic models that are currently implemented in the Interfere package, plus attributions. These dynamic models in were implemented directly from mathematical descriptions except for two, "Hodgkin Huxley Pyclustering" and "Stuart Landau Kuramoto" which adapt existing simulations from the PyClustering package [@novikov2019].
116
+
Interfere draws from the Nixtla open source ecosystem for time series forecasting. We implemented intervention support for LSTM and NHITS from the NeuralForecast package, and for ARIMA from the StatsForecast package [@olivares2022library_neuralforecast; @garza2022statsforecast]. We followed Nixtla's example for cross validation and hyperparameter optimization approaches. We integrated predictive methods from the PySINDy [@kaptanoglu2022] and StatsModels [@seabold2010statsmodels] packages. We also include ResComp, a reservoir computing method for global forecasts from [@harding_global_2024]. Hyperparameter optimization is designed around the Optuna framework [@akiba2019optuna].
132
117
118
+
While other forecasting methods exist, integrating a method with Interfere requires that the method is capable of (1) multivariate endogenous dynamic forecasting, (2) support for exogenous variables, and (3) support for flexible length forecast windows or recursive predictions. Few forecasting methods meet these criteria, and it is our hope that this package will encourage development of additional methods.
133
119
134
120
# Acknowledgements
135
121
136
-
The work described here was supported by an NSF Graduate Research Fellowship (DJP) and by award W911NF2510049 from the Army Research Office. The content is solely the responsibility of the authors and does not necessarily represent the official views of any agency supporting this research.
122
+
The work described here was supported by an NSF Graduate Research Fellowship (DJP) and by award W911NF2510049 from the Army Research Office. The content is solely the responsibility of the authors and does not necessarily represent the official views of any agency.
0 commit comments