Skip to content

Commit 2df0716

Browse files
committed
replaced section with info box about deprecated data container, added to , fixed typo
1 parent dca848c commit 2df0716

File tree

2 files changed

+18
-22
lines changed

2 files changed

+18
-22
lines changed

examples/fundamentals/data_container.ipynb

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
"\n",
6565
"After building the statistical model of your dreams, you're going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X` in linear regression models, where `mu = X @ beta`. Other data are \"observed\" examples of the endogenous outputs of your model, called `y` in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`, pandas `Series` and `DataFrame`, and even pytensor `TensorVariables`. \n",
6666
"\n",
67-
"Although you can pass these \"raw\" datatypes to your PyMC model, the best way to introduce data into your model is to use {func}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:\n",
67+
"Although you can pass these \"raw\" datatypes to your PyMC model, the best way to introduce data into your model is to use {class}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:\n",
6868
"\n",
6969
"1. Visualization of data as a component of your probabilistic graph\n",
7070
"2. Access to labeled dimensions for readability and accessibility\n",
@@ -78,11 +78,9 @@
7878
"cell_type": "markdown",
7979
"metadata": {},
8080
"source": [
81-
"## Current and past states of Data Containers\n",
82-
"\n",
83-
" PyMC data containers are always mutable. This allows you to change your data. When `X` is mutated, this enables out-of-sample inference tasks. When `y` is mutated, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.\n",
84-
" \n",
85-
" In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` or {func}`pymc.ConstantData` these are now deprecated."
81+
":::{important}\n",
82+
"In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` and {func}`pymc.ConstantData` these are now deprecated.\n",
83+
":::"
8684
]
8785
},
8886
{
@@ -717,7 +715,7 @@
717715
"cell_type": "markdown",
718716
"metadata": {},
719717
"source": [
720-
"In this next model, we create a `pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a `pm.Data` container to hold the `x` data:"
718+
"In this next model, we create a {class}`pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a {class}`pm.Data` container to hold the `x` data:"
721719
]
722720
},
723721
{
@@ -795,7 +793,7 @@
795793
"cell_type": "markdown",
796794
"metadata": {},
797795
"source": [
798-
"Because we used a `pm.Data` container, the data now appears on our probabilistic graph. It is downstream from `obs` (since the `obs` variable \"causes\" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph."
796+
"Because we used a {class}`pm.Data` container, the data now appears in our probabilistic graph. It is downstream from `obs` (since the `obs` variable \"causes\" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph."
799797
]
800798
},
801799
{
@@ -1511,7 +1509,7 @@
15111509
"cell_type": "markdown",
15121510
"metadata": {},
15131511
"source": [
1514-
"As noted above, `pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.\n",
1512+
"As noted above, {class}`pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.\n",
15151513
"\n",
15161514
"For more explanation about dimensions, coordinates and their big benefits, we encourage you to take a look at the {ref}`ArviZ documentation <arviz:xarray_for_arviz>`.\n",
15171515
"\n",
@@ -1828,7 +1826,7 @@
18281826
"cell_type": "markdown",
18291827
"metadata": {},
18301828
"source": [
1831-
"When we use `pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`."
1829+
"When we use {class}`pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`."
18321830
]
18331831
},
18341832
{
@@ -2529,7 +2527,7 @@
25292527
"\n",
25302528
"One small detail to pay attention to in this case is that the shapes of the input data (`x`) and output data (`obs`) must be the same. When we make out-of-sample predictions, we typically change only the input data, the shape of which may not be the same as the training observations. Naively changing only one will result in a shape error. There are two solutions:\n",
25312529
"\n",
2532-
"1. Use a `pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs. \n",
2530+
"1. Use a {class}`pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs. \n",
25332531
"2. Tell PyMC that the shape of the `obs` should always be the shape of the input data.\n",
25342532
"\n",
25352533
"In the next model, we use option 2. This way, we don't need to pass dummy data to `y` every time we want to change `x`."

examples/fundamentals/data_container.myst.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ az.style.use("arviz-darkgrid")
4747

4848
After building the statistical model of your dreams, you're going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X` in linear regression models, where `mu = X @ beta`. Other data are "observed" examples of the endogenous outputs of your model, called `y` in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`, pandas `Series` and `DataFrame`, and even pytensor `TensorVariables`.
4949

50-
Although you can pass these "raw" datatypes to your PyMC model, the best way to introduce data into your model is to use {func}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:
50+
Although you can pass these "raw" datatypes to your PyMC model, the best way to introduce data into your model is to use {class}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:
5151

5252
1. Visualization of data as a component of your probabilistic graph
5353
2. Access to labeled dimensions for readability and accessibility
@@ -58,11 +58,9 @@ This notebook will illustrate each of these benefits in turn, and show you the b
5858

5959
+++
6060

61-
## Current and past states of Data Containers
62-
63-
PyMC data containers are always mutable. This allows you to change your data. When `X` is mutated, this enables out-of-sample inference tasks. When `y` is mutated, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.
64-
65-
In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` or {func}`pymc.ConstantData` these are now deprecated.
61+
:::{important}
62+
In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` and {func}`pymc.ConstantData` these are now deprecated.
63+
:::
6664

6765
+++
6866

@@ -97,7 +95,7 @@ Furthermore, inside `idata`, PyMC has automatically saved the observed (endogeno
9795
idata.observed_data
9896
```
9997

100-
In this next model, we create a `pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a `pm.Data` container to hold the `x` data:
98+
In this next model, we create a {class}`pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a {class}`pm.Data` container to hold the `x` data:
10199

102100
```{code-cell} ipython3
103101
with pm.Model() as no_data_model:
@@ -110,7 +108,7 @@ with pm.Model() as no_data_model:
110108
idata = pm.sample(random_seed=RANDOM_SEED)
111109
```
112110

113-
Because we used a `pm.Data` container, the data now appears on our probabilistic graph. It is downstream from `obs` (since the `obs` variable "causes" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph.
111+
Because we used a {class}`pm.Data` container, the data now appears in our probabilistic graph. It is downstream from `obs` (since the `obs` variable "causes" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph.
114112

115113
```{code-cell} ipython3
116114
pm.model_to_graphviz(no_data_model)
@@ -140,7 +138,7 @@ df_data.index.name = "date"
140138
df_data.head()
141139
```
142140

143-
As noted above, `pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.
141+
As noted above, {class}`pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.
144142

145143
For more explanation about dimensions, coordinates and their big benefits, we encourage you to take a look at the {ref}`ArviZ documentation <arviz:xarray_for_arviz>`.
146144

@@ -195,7 +193,7 @@ Coordinates are also used by `arviz` when making plots. Here we pass `legend=Tru
195193
axes = az.plot_trace(idata, var_names=["europe_mean_temp", "expected_city_temp"], legend=True);
196194
```
197195

198-
When we use `pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`.
196+
When we use {class}`pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`.
199197

200198
```{code-cell} ipython3
201199
type(data)
@@ -279,7 +277,7 @@ A common task in machine learning is to predict values for unseen data, and the
279277

280278
One small detail to pay attention to in this case is that the shapes of the input data (`x`) and output data (`obs`) must be the same. When we make out-of-sample predictions, we typically change only the input data, the shape of which may not be the same as the training observations. Naively changing only one will result in a shape error. There are two solutions:
281279

282-
1. Use a `pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs.
280+
1. Use a {class}`pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs.
283281
2. Tell PyMC that the shape of the `obs` should always be the shape of the input data.
284282

285283
In the next model, we use option 2. This way, we don't need to pass dummy data to `y` every time we want to change `x`.

0 commit comments

Comments
 (0)