Skip to content

Commit 0e45432

Browse files
authored
update BART example to last release (#457)
1 parent d6fcd3d commit 0e45432

File tree

2 files changed

+64
-64
lines changed

2 files changed

+64
-64
lines changed

examples/case_studies/BART_introduction.ipynb

Lines changed: 45 additions & 47 deletions
Large diffs are not rendered by default.

myst_nbs/case_studies/BART_introduction.myst.md

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ jupytext:
66
format_version: 0.13
77
jupytext_version: 1.13.7
88
kernelspec:
9-
display_name: Python 3.10.6 ('pymc_env')
9+
display_name: Python 3 (ipykernel)
1010
language: python
1111
name: python3
1212
---
@@ -33,8 +33,8 @@ print(f"Running on PyMC v{pm.__version__}")
3333
```
3434

3535
```{code-cell} ipython3
36-
RANDOM_SEED = 8457
37-
rng = np.random.RandomState(RANDOM_SEED)
36+
RANDOM_SEED = 5781
37+
np.random.seed(RANDOM_SEED)
3838
az.style.use("arviz-darkgrid")
3939
```
4040

@@ -112,16 +112,18 @@ In the previous plot the white line is the median over 4000 posterior draws, and
112112

113113
The following figure shows two samples from the posterior of $\mu$. We can see that these functions are not smooth. This is fine and is a direct consequence of using regression trees. Trees can be seen as a way to represent stepwise functions, and a sum of stepwise functions is just another stepwise function. Thus, when using BART we just need to know that we are assuming that a stepwise function is a good enough approximation for our problem. In practice this is often the case because we sum over many trees, usually values like 50, 100 or 200. Additionally, we often average over the posterior distribution. All this makes the "steps smoother", even when we never really have an smooth function as for example with Gaussian processes (splines). A nice theoretical result, tells us that in the limit of $m \to \infty$ the BART prior converges to a [nowheredifferentiable](https://en.wikipedia.org/wiki/Weierstrass_function) Gaussian process.
114114

115+
The following figure shows two samples of $\mu$ from the posterior.
116+
115117
```{code-cell} ipython3
116-
plt.step(x_data, np.exp(pmb.predict(idata_coal, rng, x_data, size=2).squeeze().T));
118+
plt.step(x_data, idata_coal.posterior["μ"].sel(chain=0, draw=[3, 10]).T);
117119
```
118120

119-
To gain further intuition the next figures show 3 of the `m` trees. As we can see these are definitely not very good approximators by themselves. inspecting individuals trees is generally not necessary. We are just showing them here to generate intuition about BART.
121+
The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workins of BART.
120122

121123
```{code-cell} ipython3
122-
bart_trees = idata_coal.sample_stats.bart_trees
124+
bart_trees = μ_.owner.op.all_trees
123125
for i in [0, 1, 2]:
124-
plt.step(x_data[:, 0], [bart_trees[0, 0, i].item().predict(x) for x in x_data])
126+
plt.step(x_data[:, 0], [bart_trees[0][i].predict(x) for x in x_data])
125127
```
126128

127129
## Biking with BART
@@ -142,27 +144,27 @@ Y = bikes["count"]
142144

143145
```{code-cell} ipython3
144146
with pm.Model() as model_bikes:
145-
σ = pm.HalfNormal("σ", Y.std())
146-
μ = pmb.BART("μ", X, Y, m=50)
147-
y = pm.Normal("y", μ, σ, observed=Y)
147+
α = pm.Exponential("α", 1 / 10)
148+
μ = pmb.BART("μ", X, Y)
149+
y = pm.NegativeBinomial("y", mu=np.abs(μ), alpha=α, observed=Y)
148150
idata_bikes = pm.sample(random_seed=RANDOM_SEED)
149151
```
150152

151153
### Partial dependence plots
152154

153155
+++
154156

155-
To help us interpret the results of our model we are going to use partial dependence plot. This is a type of plot that shows the marginal effect that one covariate has on the predicted variable. That is, what is the effect that a covariate $X_i$ has of $Y$ while we average over all the other covariates ($X_j, \forall j \not = i$). This type of plot are not exclusive of BART. But they are often used in the BART literature. PyMC provides an utility function to make this plot from the inference data.
157+
To help us interpret the results of our model we are going to use partial dependence plot. This is a type of plot that shows the marginal effect that one covariate has on the predicted variable. That is, what is the effect that a covariate $X_i$ has of $Y$ while we average over all the other covariates ($X_j, \forall j \not = i$). This type of plot are not exclusive of BART. But they are often used in the BART literature. PyMC-BART provides an utility function to make this plot from the inference data.
156158

157159
```{code-cell} ipython3
158-
pmb.plot_dependence(idata_bikes, X=X, Y=Y, grid=(2, 2), var_discrete=[3]);
160+
pmb.plot_dependence(μ, X=X, Y=Y, grid=(2, 2), var_discrete=[3]);
159161
```
160162

161163
From this plot we can see the main effect of each covariate on the predicted value. This is very useful we can recover complex relationship beyond monotonic increasing or decreasing effects. For example for the `hour` covariate we can see two peaks around 8 and and 17 hs and a minimum at midnight.
162164

163-
When interpreting partial dependence plots we should be careful about the assumptions in this plot. First we are assuming variables are independent. For example when computing the effect of `hour` we have to marginalize the effect of `temperature` and this means that to compute the partial dependence value at `hour=0` we are including all observed values of temperature, and this may include temperatures that are actually not observed at midnight, given that lower temperatures are more likely than higher ones. We are seeing only averages, so if for a covariate half the values are positively associated with predicted variable and the other half negatively associated. The partial dependence plot will be flat as their contributions will cancel each other out. This is a problem that can be solved by using instead individual conditional expectation plots `pm.bart.plot_dependence(idata_bikes, kind="ice")`. Notice that all this assumptions are assumptions of the partial dependence plot, not of our model! In fact BART can easily accommodate interaction of variables Although the prior in BART regularizes high order interactions). For more on interpreting Machine Learning model you could check the "Interpretable Machine Learning" book {cite:p}`molnar2019`.
165+
When interpreting partial dependence plots we should be careful about the assumptions in this plot. First we are assuming variables are independent. For example when computing the effect of `hour` we have to marginalize the effect of `temperature` and this means that to compute the partial dependence value at `hour=0` we are including all observed values of temperature, and this may include temperatures that are actually not observed at midnight, given that lower temperatures are more likely than higher ones. We are seeing only averages, so if for a covariate half the values are positively associated with predicted variable and the other half negatively associated. The partial dependence plot will be flat as their contributions will cancel each other out. This is a problem that can be solved by using individual conditional expectation plots `pmb.plot_dependence(..., kind="ice")`. Notice that all this assumptions are assumptions of the partial dependence plot, not of our model! In fact BART can easily accommodate interaction of variables Although the prior in BART regularizes high order interactions). For more on interpreting Machine Learning model you could check the "Interpretable Machine Learning" book {cite:p}`molnar2019`.
164166

165-
Finally like with other regression method we should be careful that the effects we are seeing on individual variables are conditional on the inclusion of the other variables. So for example, while `humidity` seems to be mostly flat, meaning that this covariate has an small effect of the number of used bikes. This could be the case because `humidity` and `temperature` are correlated to some extend and once we include `temperature` in our model `humidity` does not provide too much information. Try for example fitting the model again but this time with `humidity` as the single covariate and then fitting the model again with `hour` as a single covariate. You should see that the result for this single-variate models will very similar to the previous figure for the `hour` covariate, but less similar for the `humidity` covariate.
167+
Finally like with other regression methods we should be careful that the effects we are seeing on individual variables are conditional on the inclusion of the other variables. So for example, while `humidity` seems to be mostly flat, meaning that this covariate has an small effect of the number of used bikes. This could be the case because `humidity` and `temperature` are correlated to some extend and once we include `temperature` in our model `humidity` does not provide too much extra information. Try for example fitting the model again but this time with `humidity` as the single covariate and then fitting the model again with `hour` as a single covariate. You should see that the result for this single-variate models will very similar to the previous figure for the `hour` covariate, but less similar for the `humidity` covariate.
166168

167169
+++
168170

@@ -172,17 +174,17 @@ As we saw in the previous section a partial dependence plot can visualize give u
172174

173175
The following plot shows the relative importance in a scale from 0 to 1 (less to more importance) and the sum of the individual importance is 1. See that, at least in this case, the relative importance qualitative agrees with the partial dependence plot.
174176

175-
Additionally, we provide a novel method to assess the variable importance. You can see an example in the bottom panel. On the x-axis we have the number of components (variables) and on the y-axis the Pearson correlation between the predictions made between the full-model (all variables included) and the restricted-models, those with only a subset of the variables in the full-model. The components are included following the relative variable importance order, as show in the top panel. Thus, in this example 1 component means `hour`, two components means `hour` and `temperature`, 3 components `hour`, `temperature`and `humidity`. Finally, four components means `hour`, `temperature`, `humidity`, `workingday`, i.e., the full model. Hence, from the next figure we can see that even a model with a single component, `hour`, is very close to the full model. Even more, the model with two components `hour`, and `temperature` is on average indistinguishable from the full model. The error bars represent the 94 \% HDI from the posterior predictive distribution. It is important to notice that to compute these correlations we do not resample the models, instead the predictions of the restricted-models are approximated from the full-model.
177+
Additionally, PyMC-BART provides a novel method to assess the variable importance. You can see an example in the bottom panel. On the x-axis we have the number of covariables and on the y-axis the square of the Pearson correlation coefficient between the predictions made for the full-model (all variables included) and the restricted-models, those with only a subset of the variables. The components are included following the relative variable importance order, as show in the top panel. Thus, in this example 1 component means `hour`, two components means `hour` and `temperature`, 3 components `hour`, `temperature`and `humidity`. Finally, four components means `hour`, `temperature`, `humidity`, `workingday`, i.e., the full model. Hence, from the next figure we can see that even a model with a single component, `hour`, is very close to the full model. Even more, the model with two components `hour`, and `temperature` is on average indistinguishable from the full model. The error bars represent the 94 \% HDI from the posterior predictive distribution. It is important to notice that to compute these correlations we do not resample the models, instead the predictions of the restricted-models are approximated by *prunning* variables from the full-model.
176178

177179
```{code-cell} ipython3
178-
labels = ["hour", "temperature", "humidity", "workingday"]
179-
pmb.plot_variable_importance(idata_bikes, X.values, labels, samples=100);
180+
pmb.plot_variable_importance(idata_bikes, μ, X, samples=100);
180181
```
181182

182183
## Authors
183184
* Authored by Osvaldo Martin in Dec, 2021 ([pymc-examples#259](https://github.com/pymc-devs/pymc-examples/pull/259))
184185
* Updated by Osvaldo Martin in May, 2022 ([pymc-examples#323](https://github.com/pymc-devs/pymc-examples/pull/323))
185186
* Updated by Osvaldo Martin in Sep, 2022
187+
* Updated by Osvaldo Martin in Nov, 2022
186188

187189
+++
188190

0 commit comments

Comments
 (0)