You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/fundamentals/data_container.ipynb
+9-11Lines changed: 9 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@
64
64
"\n",
65
65
"After building the statistical model of your dreams, you're going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X` in linear regression models, where `mu = X @ beta`. Other data are \"observed\" examples of the endogenous outputs of your model, called `y` in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`, pandas `Series` and `DataFrame`, and even pytensor `TensorVariables`. \n",
66
66
"\n",
67
-
"Although you can pass these \"raw\" datatypes to your PyMC model, the best way to introduce data into your model is to use {func}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:\n",
67
+
"Although you can pass these \"raw\" datatypes to your PyMC model, the best way to introduce data into your model is to use {class}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:\n",
68
68
"\n",
69
69
"1. Visualization of data as a component of your probabilistic graph\n",
70
70
"2. Access to labeled dimensions for readability and accessibility\n",
@@ -78,11 +78,9 @@
78
78
"cell_type": "markdown",
79
79
"metadata": {},
80
80
"source": [
81
-
"## Current and past states of Data Containers\n",
82
-
"\n",
83
-
" PyMC data containers are always mutable. This allows you to change your data. When `X` is mutated, this enables out-of-sample inference tasks. When `y` is mutated, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.\n",
84
-
" \n",
85
-
" In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` or {func}`pymc.ConstantData` these are now deprecated."
81
+
":::{important}\n",
82
+
"In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` and {func}`pymc.ConstantData` these are now deprecated.\n",
83
+
":::"
86
84
]
87
85
},
88
86
{
@@ -717,7 +715,7 @@
717
715
"cell_type": "markdown",
718
716
"metadata": {},
719
717
"source": [
720
-
"In this next model, we create a `pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a `pm.Data` container to hold the `x` data:"
718
+
"In this next model, we create a {class}`pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a {class}`pm.Data` container to hold the `x` data:"
721
719
]
722
720
},
723
721
{
@@ -795,7 +793,7 @@
795
793
"cell_type": "markdown",
796
794
"metadata": {},
797
795
"source": [
798
-
"Because we used a `pm.Data` container, the data now appears on our probabilistic graph. It is downstream from `obs` (since the `obs` variable \"causes\" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph."
796
+
"Because we used a {class}`pm.Data` container, the data now appears in our probabilistic graph. It is downstream from `obs` (since the `obs` variable \"causes\" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph."
799
797
]
800
798
},
801
799
{
@@ -1511,7 +1509,7 @@
1511
1509
"cell_type": "markdown",
1512
1510
"metadata": {},
1513
1511
"source": [
1514
-
"As noted above, `pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.\n",
1512
+
"As noted above, {class}`pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.\n",
1515
1513
"\n",
1516
1514
"For more explanation about dimensions, coordinates and their big benefits, we encourage you to take a look at the {ref}`ArviZ documentation <arviz:xarray_for_arviz>`.\n",
1517
1515
"\n",
@@ -1828,7 +1826,7 @@
1828
1826
"cell_type": "markdown",
1829
1827
"metadata": {},
1830
1828
"source": [
1831
-
"When we use `pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`."
1829
+
"When we use {class}`pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`."
1832
1830
]
1833
1831
},
1834
1832
{
@@ -2529,7 +2527,7 @@
2529
2527
"\n",
2530
2528
"One small detail to pay attention to in this case is that the shapes of the input data (`x`) and output data (`obs`) must be the same. When we make out-of-sample predictions, we typically change only the input data, the shape of which may not be the same as the training observations. Naively changing only one will result in a shape error. There are two solutions:\n",
2531
2529
"\n",
2532
-
"1. Use a `pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs. \n",
2530
+
"1. Use a {class}`pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs. \n",
2533
2531
"2. Tell PyMC that the shape of the `obs` should always be the shape of the input data.\n",
2534
2532
"\n",
2535
2533
"In the next model, we use option 2. This way, we don't need to pass dummy data to `y` every time we want to change `x`."
Copy file name to clipboardExpand all lines: examples/fundamentals/data_container.myst.md
+9-11Lines changed: 9 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ az.style.use("arviz-darkgrid")
47
47
48
48
After building the statistical model of your dreams, you're going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X` in linear regression models, where `mu = X @ beta`. Other data are "observed" examples of the endogenous outputs of your model, called `y` in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`, pandas `Series` and `DataFrame`, and even pytensor `TensorVariables`.
49
49
50
-
Although you can pass these "raw" datatypes to your PyMC model, the best way to introduce data into your model is to use {func}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:
50
+
Although you can pass these "raw" datatypes to your PyMC model, the best way to introduce data into your model is to use {class}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:
51
51
52
52
1. Visualization of data as a component of your probabilistic graph
53
53
2. Access to labeled dimensions for readability and accessibility
@@ -58,11 +58,9 @@ This notebook will illustrate each of these benefits in turn, and show you the b
58
58
59
59
+++
60
60
61
-
## Current and past states of Data Containers
62
-
63
-
PyMC data containers are always mutable. This allows you to change your data. When `X` is mutated, this enables out-of-sample inference tasks. When `y` is mutated, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.
64
-
65
-
In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` or {func}`pymc.ConstantData` these are now deprecated.
61
+
:::{important}
62
+
In past versions of PyMC, there were two types of data containers {func}`pymc.MutableData` and {func}`pymc.ConstantData` these are now deprecated.
63
+
:::
66
64
67
65
+++
68
66
@@ -97,7 +95,7 @@ Furthermore, inside `idata`, PyMC has automatically saved the observed (endogeno
97
95
idata.observed_data
98
96
```
99
97
100
-
In this next model, we create a `pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a `pm.Data` container to hold the `x` data:
98
+
In this next model, we create a {class}`pm.Data` container to hold the observations, and pass this container to the `observed`. We also make a {class}`pm.Data` container to hold the `x` data:
101
99
102
100
```{code-cell} ipython3
103
101
with pm.Model() as no_data_model:
@@ -110,7 +108,7 @@ with pm.Model() as no_data_model:
110
108
idata = pm.sample(random_seed=RANDOM_SEED)
111
109
```
112
110
113
-
Because we used a `pm.Data` container, the data now appears on our probabilistic graph. It is downstream from `obs` (since the `obs` variable "causes" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph.
111
+
Because we used a {class}`pm.Data` container, the data now appears in our probabilistic graph. It is downstream from `obs` (since the `obs` variable "causes" the data), shaded in gray (because it is observed), and has a special rounded square shape to emphasize that it is data. We also see that `x_data` has been added to the graph.
114
112
115
113
```{code-cell} ipython3
116
114
pm.model_to_graphviz(no_data_model)
@@ -140,7 +138,7 @@ df_data.index.name = "date"
140
138
df_data.head()
141
139
```
142
140
143
-
As noted above, `pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.
141
+
As noted above, {class}`pm.Data` gives you the ability to give named labels to the dimensions of your data. This is done by passing a dictionary of `dimension: coordinate` key-value pairs to the `coords` argument of {class}`pymc.Model` when you create your model.
144
142
145
143
For more explanation about dimensions, coordinates and their big benefits, we encourage you to take a look at the {ref}`ArviZ documentation <arviz:xarray_for_arviz>`.
146
144
@@ -195,7 +193,7 @@ Coordinates are also used by `arviz` when making plots. Here we pass `legend=Tru
When we use `pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`.
196
+
When we use {class}`pm.Data`, the data are internally represented as a pytensor {class}`pytensor.tensor.sharedvar.TensorSharedVariable`.
199
197
200
198
```{code-cell} ipython3
201
199
type(data)
@@ -279,7 +277,7 @@ A common task in machine learning is to predict values for unseen data, and the
279
277
280
278
One small detail to pay attention to in this case is that the shapes of the input data (`x`) and output data (`obs`) must be the same. When we make out-of-sample predictions, we typically change only the input data, the shape of which may not be the same as the training observations. Naively changing only one will result in a shape error. There are two solutions:
281
279
282
-
1. Use a `pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs.
280
+
1. Use a {class}`pm.Data` for the `x` data and the `y` data, and use `pm.set_data` to change `y` to something of the same shape as the test inputs.
283
281
2. Tell PyMC that the shape of the `obs` should always be the shape of the input data.
284
282
285
283
In the next model, we use option 2. This way, we don't need to pass dummy data to `y` every time we want to change `x`.
0 commit comments