Skip to content

Commit 1a845e3

Browse files
committed
Merge remote-tracking branch 'upstream/dev' into standardize_in_approx
2 parents 0869e3f + 01aadf1 commit 1a845e3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1449
-281
lines changed

README.md

Lines changed: 39 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -49,39 +49,6 @@ neural networks for parameter estimation, model comparison, and model validation
4949
when working with intractable simulators whose behavior as a whole is too
5050
complex to be described analytically.
5151

52-
## Getting Started
53-
54-
Using the high-level interface is easy, as demonstrated by the minimal working example below:
55-
56-
```python
57-
import bayesflow as bf
58-
59-
workflow = bf.BasicWorkflow(
60-
inference_network=bf.networks.CouplingFlow(),
61-
summary_network=bf.networks.TimeSeriesNetwork(),
62-
inference_variables=["parameters"],
63-
summary_variables=["observables"],
64-
simulator=bf.simulators.SIR()
65-
)
66-
67-
history = workflow.fit_online(epochs=15, batch_size=32, num_batches_per_epoch=200)
68-
69-
diagnostics = workflow.plot_default_diagnostics(test_data=300)
70-
```
71-
72-
For an in-depth exposition, check out our walkthrough notebooks below.
73-
74-
1. [Linear regression starter example](examples/Linear_Regression_Starter.ipynb)
75-
2. [From ABC to BayesFlow](examples/From_ABC_to_BayesFlow.ipynb)
76-
3. [Two moons starter example](examples/Two_Moons_Starter.ipynb)
77-
4. [Rapid iteration with point estimators](examples/Lotka_Volterra_Point_Estimation_and_Expert_Stats.ipynb)
78-
5. [SIR model with custom summary network](examples/SIR_Posterior_Estimation.ipynb)
79-
6. [Bayesian experimental design](examples/Bayesian_Experimental_Design.ipynb)
80-
7. [Simple model comparison example](examples/One_Sample_TTest.ipynb)
81-
8. [Moving from BayesFlow v1.1 to v2.0](examples/From_BayesFlow_1.1_to_2.0.ipynb)
82-
83-
More tutorials are always welcome! Please consider making a pull request if you have a cool application that you want to contribute.
84-
8552
## Install
8653

8754
You can install the latest stable version from PyPI using:
@@ -132,9 +99,46 @@ export KERAS_BACKEND=jax
13299

133100
This way, you also don't have to manually set the backend every time you are starting Python to use BayesFlow.
134101

135-
**Caution:** Some development environments (e.g., VSCode or PyCharm) can silently overwrite environment variables. If you have set your backend as an environment variable and you still get keras-related import errors when loading BayesFlow, these IDE shenanigans might be the culprit. Try setting the keras backend in your Python script via `import os; os.environ["KERAS_BACKEND"] = "<YOUR-BACKEND>"`.
102+
## Getting Started
103+
104+
Using the high-level interface is easy, as demonstrated by the minimal working example below:
105+
106+
```python
107+
import bayesflow as bf
108+
109+
workflow = bf.BasicWorkflow(
110+
inference_network=bf.networks.CouplingFlow(),
111+
summary_network=bf.networks.TimeSeriesNetwork(),
112+
inference_variables=["parameters"],
113+
summary_variables=["observables"],
114+
simulator=bf.simulators.SIR()
115+
)
116+
117+
history = workflow.fit_online(epochs=15, batch_size=32, num_batches_per_epoch=200)
118+
119+
diagnostics = workflow.plot_default_diagnostics(test_data=300)
120+
```
121+
122+
For an in-depth exposition, check out our expanding list of resources below.
123+
124+
### Books
125+
126+
Many examples from [Bayesian Cognitive Modeling: A Practical Course](https://bayesmodels.com/) by Lee & Wagenmakers (2013) in [BayesFlow](https://kucharssim.github.io/bayesflow-cognitive-modeling-book/).
127+
128+
### Tutorial notebooks
129+
130+
1. [Linear regression starter example](examples/Linear_Regression_Starter.ipynb)
131+
2. [From ABC to BayesFlow](examples/From_ABC_to_BayesFlow.ipynb)
132+
3. [Two moons starter example](examples/Two_Moons_Starter.ipynb)
133+
4. [Rapid iteration with point estimators](examples/Lotka_Volterra_Point_Estimation_and_Expert_Stats.ipynb)
134+
5. [SIR model with custom summary network](examples/SIR_Posterior_Estimation.ipynb)
135+
6. [Bayesian experimental design](examples/Bayesian_Experimental_Design.ipynb)
136+
7. [Simple model comparison example](examples/One_Sample_TTest.ipynb)
137+
8. [Moving from BayesFlow v1.1 to v2.0](examples/From_BayesFlow_1.1_to_2.0.ipynb)
138+
139+
More tutorials are always welcome! Please consider making a pull request if you have a cool application that you want to contribute.
136140

137-
### From Source
141+
## Contributing
138142

139143
If you want to contribute to BayesFlow, we recommend installing it from source, see [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
140144

bayesflow/adapters/adapter.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
Keep,
1919
Log,
2020
MapTransform,
21+
NNPE,
2122
NumpyTransform,
2223
OneHot,
2324
Rename,
@@ -30,6 +31,7 @@
3031
Ungroup,
3132
RandomSubsample,
3233
Take,
34+
NanToNum,
3335
)
3436
from .transforms.filter_transform import Predicate
3537

@@ -699,6 +701,43 @@ def map_dtype(self, keys: str | Sequence[str], to_dtype: str):
699701
self.transforms.append(transform)
700702
return self
701703

704+
def nnpe(
705+
self,
706+
keys: str | Sequence[str],
707+
*,
708+
spike_scale: float | None = None,
709+
slab_scale: float | None = None,
710+
per_dimension: bool = True,
711+
seed: int | None = None,
712+
):
713+
"""Append an :py:class:`~transforms.NNPE` transform to the adapter.
714+
715+
Parameters
716+
----------
717+
keys : str or Sequence of str
718+
The names of the variables to transform.
719+
spike_scale : float or np.ndarray or None, default=None
720+
The scale of the spike (Normal) distribution. Automatically determined if None.
721+
slab_scale : float or np.ndarray or None, default=None
722+
The scale of the slab (Cauchy) distribution. Automatically determined if None.
723+
per_dimension : bool, default=True
724+
If true, noise is applied per dimension of the last axis of the input data.
725+
If false, noise is applied globally.
726+
seed : int or None
727+
The seed for the random number generator. If None, a random seed is used.
728+
"""
729+
if isinstance(keys, str):
730+
keys = [keys]
731+
732+
transform = MapTransform(
733+
{
734+
key: NNPE(spike_scale=spike_scale, slab_scale=slab_scale, per_dimension=per_dimension, seed=seed)
735+
for key in keys
736+
}
737+
)
738+
self.transforms.append(transform)
739+
return self
740+
702741
def one_hot(self, keys: str | Sequence[str], num_classes: int):
703742
"""Append a :py:class:`~transforms.OneHot` transform to the adapter.
704743
@@ -918,3 +957,34 @@ def to_dict(self):
918957
transform = ToDict()
919958
self.transforms.append(transform)
920959
return self
960+
961+
def nan_to_num(
962+
self,
963+
keys: str | Sequence[str],
964+
default_value: float = 0.0,
965+
return_mask: bool = False,
966+
mask_prefix: str = "mask",
967+
):
968+
"""
969+
Append :py:class:`~bf.adapters.transforms.NanToNum` transform to the adapter.
970+
971+
Parameters
972+
----------
973+
keys : str or sequence of str
974+
The names of the variables to clean / mask.
975+
default_value : float
976+
Value to substitute wherever data is NaN. Defaults to 0.0.
977+
return_mask : bool
978+
If True, encode a binary missingness mask alongside the data. Defaults to False.
979+
mask_prefix : str
980+
Prefix for the mask key in the output dictionary. Defaults to 'mask_'. If the mask key already exists,
981+
a ValueError is raised to avoid overwriting existing masks.
982+
"""
983+
if isinstance(keys, str):
984+
keys = [keys]
985+
986+
for key in keys:
987+
self.transforms.append(
988+
NanToNum(key=key, default_value=default_value, return_mask=return_mask, mask_prefix=mask_prefix)
989+
)
990+
return self

bayesflow/adapters/transforms/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from .keep import Keep
1313
from .log import Log
1414
from .map_transform import MapTransform
15+
from .nnpe import NNPE
1516
from .numpy_transform import NumpyTransform
1617
from .one_hot import OneHot
1718
from .rename import Rename
@@ -28,6 +29,7 @@
2829
from .random_subsample import RandomSubsample
2930
from .take import Take
3031
from .ungroup import Ungroup
32+
from .nan_to_num import NanToNum
3133

3234
from ...utils._docs import _add_imports_to_all
3335

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
import numpy as np
2+
3+
from bayesflow.utils.serialization import serializable, serialize
4+
from .transform import Transform
5+
6+
7+
@serializable("bayesflow.adapters")
8+
class NanToNum(Transform):
9+
"""
10+
Replace NaNs with a default value, and optionally encode a missing-data mask as a separate output key.
11+
12+
This is based on "Missing data in amortized simulation-based neural posterior estimation" by Wang et al. (2024).
13+
14+
Parameters
15+
----------
16+
default_value : float
17+
Value to substitute wherever data is NaN.
18+
return_mask : bool, default=False
19+
If True, a mask array will be returned under a new key.
20+
mask_prefix : str, default='mask_'
21+
Prefix for the mask key in the output dictionary.
22+
"""
23+
24+
def __init__(self, key: str, default_value: float = 0.0, return_mask: bool = False, mask_prefix: str = "mask"):
25+
super().__init__()
26+
self.key = key
27+
self.default_value = default_value
28+
self.return_mask = return_mask
29+
self.mask_prefix = mask_prefix
30+
31+
def get_config(self) -> dict:
32+
return serialize(
33+
{
34+
"key": self.key,
35+
"default_value": self.default_value,
36+
"return_mask": self.return_mask,
37+
"mask_prefix": self.mask_prefix,
38+
}
39+
)
40+
41+
@property
42+
def mask_key(self) -> str:
43+
"""
44+
Key under which the mask will be stored in the output dictionary.
45+
"""
46+
return f"{self.mask_prefix}_{self.key}"
47+
48+
def forward(self, data: dict[str, any], **kwargs) -> dict[str, any]:
49+
"""
50+
Forward transform: fill NaNs and optionally output mask under 'mask_<key>'.
51+
"""
52+
data = data.copy()
53+
54+
# Check if the mask key already exists in the data
55+
if self.mask_key in data.keys():
56+
raise ValueError(
57+
f"Mask key '{self.mask_key}' already exists in the data. Please choose a different mask_prefix."
58+
)
59+
60+
# Identify NaNs and fill with default value
61+
mask = np.isnan(data[self.key])
62+
data[self.key] = np.nan_to_num(data[self.key], copy=False, nan=self.default_value)
63+
64+
if not self.return_mask:
65+
return data
66+
67+
# Prepare mask array (1 for valid, 0 for NaN)
68+
mask_array = (~mask).astype(np.int8)
69+
70+
# Return both the filled data and the mask under separate keys
71+
data[self.mask_key] = mask_array
72+
return data
73+
74+
def inverse(self, data: dict[str, any], **kwargs) -> dict[str, any]:
75+
"""
76+
Inverse transform: restore NaNs using the mask under 'mask_<key>'.
77+
"""
78+
data = data.copy()
79+
80+
# Retrieve mask and values to reconstruct NaNs
81+
values = data[self.key]
82+
83+
if not self.return_mask:
84+
values[values == self.default_value] = np.nan # we assume default_value is not in data
85+
else:
86+
mask_array = data[self.mask_key].astype(bool)
87+
# Put NaNs where mask is 0
88+
values[~mask_array] = np.nan
89+
90+
data[self.key] = values
91+
return data

0 commit comments

Comments
 (0)