Merge pull request #103 from stefanradev93/Development

stefanradev93 · web-flow · commit fda44790cd9a · 2023-09-22T11:25:38.000-04:00
Development
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,72 @@
+cff-version: "1.2.0"
+authors:
+- family-names: Radev
+  given-names: Stefan T.
+  orcid: "https://orcid.org/0000-0002-6702-9559"
+- family-names: Schmitt
+  given-names: Marvin
+  orcid: "https://orcid.org/0000-0003-1293-820X"
+- family-names: Schumacher
+  given-names: Lukas
+  orcid: "https://orcid.org/0000-0003-1512-8288"
+- family-names: Elsemüller
+  given-names: Lasse
+  orcid: "https://orcid.org/0000-0003-0368-720X"
+- family-names: Pratz
+  given-names: Valentin
+  orcid: "https://orcid.org/0000-0001-8371-3417"
+- family-names: Schälte
+  given-names: Yannik
+  orcid: "https://orcid.org/0000-0003-1293-820X"
+- family-names: Köthe
+  given-names: Ullrich
+  orcid: "https://orcid.org/0000-0001-6036-1287"
+- family-names: Bürkner
+  given-names: Paul-Christian
+  orcid: "https://orcid.org/0000-0001-5765-8995"
+contact:
+- family-names: Radev
+  given-names: Stefan T.
+  orcid: "https://orcid.org/0000-0002-6702-9559"
+doi: 10.5281/zenodo.8346393
+message: If you use this software, please cite our article in the
+  Journal of Open Source Software.
+preferred-citation:
+  authors:
+  - family-names: Radev
+    given-names: Stefan T.
+    orcid: "https://orcid.org/0000-0002-6702-9559"
+  - family-names: Schmitt
+    given-names: Marvin
+    orcid: "https://orcid.org/0000-0003-1293-820X"
+  - family-names: Schumacher
+    given-names: Lukas
+    orcid: "https://orcid.org/0000-0003-1512-8288"
+  - family-names: Elsemüller
+    given-names: Lasse
+    orcid: "https://orcid.org/0000-0003-0368-720X"
+  - family-names: Pratz
+    given-names: Valentin
+    orcid: "https://orcid.org/0000-0001-8371-3417"
+  - family-names: Schälte
+    given-names: Yannik
+    orcid: "https://orcid.org/0000-0003-1293-820X"
+  - family-names: Köthe
+    given-names: Ullrich
+    orcid: "https://orcid.org/0000-0001-6036-1287"
+  - family-names: Bürkner
+    given-names: Paul-Christian
+    orcid: "https://orcid.org/0000-0001-5765-8995"
+  date-published: 2023-09-22
+  doi: 10.21105/joss.05702
+  issn: 2475-9066
+  issue: 89
+  journal: Journal of Open Source Software
+  publisher:
+    name: Open Journals
+  start: 5702
+  title: "BayesFlow: Amortized Bayesian Workflows With Neural Networks"
+  type: article
+  url: "https://joss.theoj.org/papers/10.21105/joss.05702"
+  volume: 8
+title: "BayesFlow: Amortized Bayesian Workflows With Neural Networks"
diff --git a/README.md b/README.md
@@ -2,6 +2,7 @@
 
 [![Actions Status](https://github.com/stefanradev93/bayesflow/workflows/Tests/badge.svg)](https://github.com/stefanradev93/bayesflow/actions)
 [![Licence](https://img.shields.io/github/license/stefanradev93/BayesFlow)](https://img.shields.io/github/license/stefanradev93/BayesFlow)
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.05702/status.svg)](https://doi.org/10.21105/joss.05702)
 
 Welcome to our BayesFlow library for efficient simulation-based Bayesian workflows! Our library enables users to create specialized neural networks for *amortized Bayesian inference*, which repay users with rapid statistical inference after a potentially longer simulation-based training phase.
 
@@ -76,51 +77,52 @@ generative_model = bf.simulation.GenerativeModel(prior, simulator)
 Next, we create our BayesFlow setup consisting of a summary and an inference network:
 
 ```python
-summary_net = bf.networks.DeepSet()
+summary_net = bf.networks.SetTransformer(input_dim=2)
 inference_net = bf.networks.InvertibleNetwork(num_params=2)
 amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net)
 ```
 
 Finally, we connect the networks with the generative model via a `Trainer` instance:
 
 ```python
-trainer = bf.trainers.Trainer(amortizer=amortized_posterior, generative_model=generative_model, memory=True)
+trainer = bf.trainers.Trainer(amortizer=amortized_posterior, generative_model=generative_model)
 ```
 
 We are now ready to train an amortized posterior approximator. For instance,
 to run online training, we simply call:
 
 ```python
-losses = trainer.train_online(epochs=10, iterations_per_epoch=500, batch_size=32)
+losses = trainer.train_online(epochs=10, iterations_per_epoch=1000, batch_size=32)
 ```
 
-Before inference, we can use simulation-based calibration (SBC,
+Prior to inference, we can use simulation-based calibration (SBC,
 https://arxiv.org/abs/1804.06788) to check the computational faithfulness of
-the model-amortizer combination:
+the model-amortizer combination on unseen simulations:
 
 ```python
-fig = trainer.diagnose_sbc_histograms()
+# Generate 500 new simulated data sets
+new_sims = trainer.configurator(generative_model(500))
+
+# Obtain 100 posteriors draws per data set instantly
+posterior_draws = amortized_posterior.sample(new_sims, n_samples=100)
+
+# Diagnoze calibration
+fig = bf.diagnostics.plot_sbc_histograms(posterior_draws, new_sims['parameters'])
 ```
 
 <img src="https://github.com/stefanradev93/BayesFlow/blob/master/img/showcase_sbc.png?raw=true" width=65% height=65%>
 
 The histograms are roughly uniform and lie within the expected range for
 well-calibrated inference algorithms as indicated by the shaded gray areas.
-Accordingly, our amortizer seems to have converged to the intended target.
-
-Amortized inference on new (real or simulated) data is then easy and fast.
-For example, we can simulate 200 new data sets and generate 500 posterior draws
-per data set:
+Accordingly, our neural approximator seems to have converged to the intended target.
 
-```python
-new_sims = trainer.configurator(generative_model(200))
-posterior_draws = amortized_posterior.sample(new_sims, n_samples=500)
-```
+As you can see, amortized inference on new (real or simulated) data is easy and fast.
+We can obtain further 5000 posterior draws per simulated data set and quickly inspect 
+how well the model can recover its parameters across the entire *prior predictive distribution*.
 
-We can then quickly inspect the how well the model can recover its parameters
-across the simulated data sets.
 
 ```python
+posterior_draws = amortized_posterior.sample(new_sims, n_samples=5000)
 fig = bf.diagnostics.plot_recovery(posterior_draws, new_sims['parameters'])
 ```
 
@@ -162,7 +164,7 @@ A modified loss function optimizes the learned summary statistics towards a unit
 Gaussian and reliably detects model misspecification during inference time.
 
 
-<img src="https://github.com/stefanradev93/BayesFlow/blob/master/examples/img/model_misspecification_amortized_sbi.png" width=100% height=100%>
+<img src="https://github.com/stefanradev93/BayesFlow/blob/master/examples/img/model_misspecification_amortized_sbi.png?raw=true" width=100% height=100%>
 
 In order to use this method, you should only provide the `summary_loss_fun` argument
 to the `AmortizedPosterior` instance:
@@ -206,7 +208,7 @@ meta_model = bf.simulation.MultiGenerativeModel([model_m1, model_m2])
 Next, we construct our neural network with a `PMPNetwork` for approximating posterior model probabilities:
 
 ```python
-summary_net = bf.networks.DeepSet()
+summary_net = bf.networks.SetTransformer(input_dim=2)
 probability_net = bf.networks.PMPNetwork(num_models=2)
 amortized_bmc = bf.amortizers.AmortizedModelComparison(probability_net, summary_net)
 ```
diff --git a/bayesflow/default_settings.py b/bayesflow/default_settings.py
@@ -73,7 +73,7 @@ def __init__(self, meta_dict: dict, mandatory_fields: list = []):
 }
 
 
-DEFAULT_SETTING_DENSE_INVARIANT = {"units": 64, "activation": "relu", "kernel_initializer": "glorot_uniform"}
+DEFAULT_SETTING_DENSE_DEEP_SET = {"units": 64, "activation": "relu", "kernel_initializer": "glorot_uniform"}
 
 
 DEFAULT_SETTING_DENSE_RECT = {"units": 256, "activation": "swish", "kernel_initializer": "glorot_uniform"}
diff --git a/bayesflow/diagnostics.py b/bayesflow/diagnostics.py
@@ -51,6 +51,8 @@ def plot_recovery(
     color="#8f2727",
     n_col=None,
     n_row=None,
+    xlabel="Ground truth",
+    ylabel="Estimated",
 ):
     """Creates and plots publication-ready recovery plot with true vs. point estimate + uncertainty.
     The point estimate can be controlled with the ``point_agg`` argument, and the uncertainty estimate
@@ -96,7 +98,11 @@ def plot_recovery(
         A flag for adding R^2 between true and estimates to the plot
     color             : str, optional, default: '#8f2727'
         The color for the true vs. estimated scatter points and error bars
-
+    xlabel            : str, optional, default: 'Ground truth'
+        The label on the x-axis of the plot
+    ylabel            : str, optional, default: 'Estimated'
+        The label on the y-axis of the plot
+        
     Returns
     -------
     f : plt.Figure - the figure instance for optional saving
@@ -198,15 +204,15 @@ def plot_recovery(
     # Only add x-labels to the bottom row
     bottom_row = axarr if n_row == 1 else axarr[0] if n_col == 1 else axarr[n_row - 1, :]
     for _ax in bottom_row:
-        _ax.set_xlabel("Ground truth", fontsize=label_fontsize)
+        _ax.set_xlabel(xlabel, fontsize=label_fontsize)
 
     # Only add y-labels to right left-most row
     if n_row == 1:  # if there is only one row, the ax array is 1D
-        axarr[0].set_ylabel("Estimated", fontsize=label_fontsize)
+        axarr[0].set_ylabel(ylabel, fontsize=label_fontsize)
     # If there is more than one row, the ax array is 2D
     else:
         for _ax in axarr[:, 0]:
-            _ax.set_ylabel("Estimated", fontsize=label_fontsize)
+            _ax.set_ylabel(ylabel, fontsize=label_fontsize)
 
     # Remove unused axes entirely
     for _ax in axarr_it[n_params:]:
diff --git a/bayesflow/summary_networks.py b/bayesflow/summary_networks.py
@@ -198,8 +198,8 @@ def __init__(
         features from an input set using a set of seed vectors (typically one for a single summary) with ``summary_dim``
         output dimensions.
 
-        Recommnded: When using transformers as summary networks, you may want to use a smaller learning rate
-        during training, e.g., setting ``default_lr=1e-5`` in a ``Trainer`` instance.
+        Recommended: When using transformers as summary networks, you may want to use a smaller learning rate
+        during training, e.g., setting ``default_lr=1e-4`` in a ``Trainer`` instance.
 
         Parameters
         ----------
@@ -211,7 +211,7 @@ def __init__(
 
             ``attention_settings=dict(num_heads=4, key_dim=32)``
 
-            You may also want to include dropout regularization in small-to-medium data regimes:
+            You may also want to include stronger dropout regularization in small-to-medium data regimes:
 
             ``attention_settings=dict(num_heads=4, key_dim=32, dropout=0.1)``
 
@@ -235,7 +235,7 @@ def __init__(
             The number of self-attention blocks to use before pooling.
         num_inducing_points  : int or None, optional, default: 32
             The number of inducing points. Should be lower than the smallest set size.
-            If ``None`` selected, a vanilla self-attenion block (SAB) will be used, otherwise
+            If ``None`` selected, a vanilla self-attention block (SAB) will be used, otherwise
             ISAB blocks will be used. For ``num_attention_blocks > 1``, we currently recommend
             always using some number of inducing points.
         num_seeds            : int, optional, default: 1
@@ -355,9 +355,9 @@ def __init__(
             num_dense_s1=num_dense_s1,
             num_dense_s2=num_dense_s2,
             num_dense_s3=num_dense_s3,
-            dense_s1_args=defaults.DEFAULT_SETTING_DENSE_INVARIANT if dense_s1_args is None else dense_s1_args,
-            dense_s2_args=defaults.DEFAULT_SETTING_DENSE_INVARIANT if dense_s2_args is None else dense_s2_args,
-            dense_s3_args=defaults.DEFAULT_SETTING_DENSE_INVARIANT if dense_s3_args is None else dense_s3_args,
+            dense_s1_args=defaults.DEFAULT_SETTING_DENSE_DEEP_SET if dense_s1_args is None else dense_s1_args,
+            dense_s2_args=defaults.DEFAULT_SETTING_DENSE_DEEP_SET if dense_s2_args is None else dense_s2_args,
+            dense_s3_args=defaults.DEFAULT_SETTING_DENSE_DEEP_SET if dense_s3_args is None else dense_s3_args,
             pooling_fun=pooling_fun,
         )
 
@@ -369,7 +369,7 @@ def __init__(
         self.out_layer = Dense(summary_dim, activation="linear")
         self.summary_dim = summary_dim
 
-    def call(self, x):
+    def call(self, x, **kwargs):
         """Performs the forward pass of a learnable deep invariant transformation consisting of
         a sequence of equivariant transforms followed by an invariant transform.
 
@@ -385,10 +385,10 @@ def call(self, x):
         """
 
         # Pass through series of augmented equivariant transforms
-        out_equiv = self.equiv_layers(x)
+        out_equiv = self.equiv_layers(x, **kwargs)
 
         # Pass through final invariant layer
-        out = self.out_layer(self.inv(out_equiv))
+        out = self.out_layer(self.inv(out_equiv, **kwargs), **kwargs)
 
         return out
 
@@ -443,7 +443,7 @@ def __init__(
         conv_settings   : dict or None, optional, default: None
             The arguments passed to the `MultiConv1D` internal networks. If `None`,
             defaults will be used from `default_settings`. If a dictionary is provided,
-            it should contain the followin keys:
+            it should contain the following keys:
             - layer_args      (dict) : arguments for `tf.keras.layers.Conv1D` without kernel_size
             - min_kernel_size (int)  : the minimum kernel size (>= 1)
             - max_kernel_size (int)  : the maximum kernel size
@@ -508,8 +508,8 @@ class SplitNetwork(tf.keras.Model):
     of data to provide an individual network for each split of the data.
     """
 
-    def __init__(self, num_splits, split_data_configurator, network_type=InvariantNetwork, network_kwargs={}, **kwargs):
-        """Creates a composite network of `num_splits` sub-networks of type `network_type`, each with configuration
+    def __init__(self, num_splits, split_data_configurator, network_type=DeepSet, network_kwargs={}, **kwargs):
+        """Creates a composite network of `num_splits` subnetworks of type `network_type`, each with configuration
         specified by `meta`.
 
         Parameters
@@ -535,7 +535,7 @@ def __init__(self, num_splits, split_data_configurator, network_type=InvariantNe
             indicating which rows belong to the split `i`.
         network_type            : callable, optional, default: `InvariantNetowk`
             Type of neural network to use.
-        meta                    : dict, optional, default: {}
+        network_kwargs          : dict, optional, default: {}
             A dictionary containing the configuration for the networks.
         **kwargs
             Optional keyword arguments to be passed to the `tf.keras.Model` superclass.
@@ -547,7 +547,7 @@ def __init__(self, num_splits, split_data_configurator, network_type=InvariantNe
         self.split_data_configurator = split_data_configurator
         self.networks = [network_type(**network_kwargs) for _ in range(num_splits)]
 
-    def call(self, x):
+    def call(self, x, **kwargs):
         """Performs a forward pass through the subnetworks and concatenates their output.
 
         Parameters
@@ -561,7 +561,7 @@ def call(self, x):
             Output of shape (batch_size, out_dim)
         """
 
-        out = [self.networks[i](self.split_data_configurator(i, x)) for i in range(self.num_splits)]
+        out = [self.networks[i](self.split_data_configurator(i, x), **kwargs) for i in range(self.num_splits)]
         out = tf.concat(out, axis=-1)
         return out
 
@@ -602,7 +602,7 @@ def call(self, x, return_all=False, **kwargs):
 
         Parameters
         ----------
-        data       : tf.Tensor of shape (batch_size, ..., data_dim)
+        x          : tf.Tensor of shape (batch_size, ..., data_dim)
             Example, hierarchical data sets with two levels:
             (batch_size, D, L, x_dim) -> reduces to (batch_size, out_dim).
         return_all : boolean, optional, default: False

Original file line number	Diff line number	Diff line change
`@@ -73,7 +73,7 @@ def __init__(self, meta_dict: dict, mandatory_fields: list = []):`
`73`	`73`	`}`
`74`	`74`
`75`	`75`
`76`		`-DEFAULT_SETTING_DENSE_INVARIANT = {"units": 64, "activation": "relu", "kernel_initializer": "glorot_uniform"}`
	`76`	`+DEFAULT_SETTING_DENSE_DEEP_SET = {"units": 64, "activation": "relu", "kernel_initializer": "glorot_uniform"}`
`77`	`77`
`78`	`78`
`79`	`79`	`DEFAULT_SETTING_DENSE_RECT = {"units": 256, "activation": "swish", "kernel_initializer": "glorot_uniform"}`