Merge branch 'main' of https://github.com/Jash2606/pymc-examples

Jash2606 · Jash2606 · commit a51448523f3b · 2024-08-03T16:41:09.000+05:30
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -3,11 +3,11 @@ ci:
 
 repos:
 - repo: https://github.com/psf/black
-  rev: 23.7.0
+  rev: 24.4.2
   hooks:
     - id: black-jupyter
 - repo: https://github.com/nbQA-dev/nbQA
-  rev: 1.7.0
+  rev: 1.8.5
   hooks:
     - id: nbqa-isort
       additional_dependencies: [isort==5.6.4]
@@ -99,7 +99,7 @@ repos:
       language: pygrep
       types_or: [markdown, rst, jupyter]
 - repo: https://github.com/mwouts/jupytext
-  rev: v1.15.1
+  rev: v1.16.3
   hooks:
   - id: jupytext
     files: ^examples/.+\.ipynb$
diff --git a/examples/ode_models/ODE_with_manual_gradients.ipynb b/examples/ode_models/ODE_with_manual_gradients.ipynb
@@ -189,9 +189,9 @@
     "                ret = np.zeros(\n",
     "                    (self._n_states, self._n_odeparams + self._n_ivs)\n",
     "                )  # except the following entries\n",
-    "                ret[\n",
-    "                    0, 0\n",
-    "                ] = X  # \\frac{\\partial  [\\alpha X - \\beta XY]}{\\partial \\alpha}, and so on...\n",
+    "                ret[0, 0] = (\n",
+    "                    X  # \\frac{\\partial  [\\alpha X - \\beta XY]}{\\partial \\alpha}, and so on...\n",
+    "                )\n",
     "                ret[0, 1] = -X * Y\n",
     "                ret[1, 2] = -Y\n",
     "                ret[1, 3] = X * Y\n",
diff --git a/examples/ode_models/ODE_with_manual_gradients.myst.md b/examples/ode_models/ODE_with_manual_gradients.myst.md
@@ -157,9 +157,9 @@ class LotkaVolterraModel:
                 ret = np.zeros(
                     (self._n_states, self._n_odeparams + self._n_ivs)
                 )  # except the following entries
-                ret[
-                    0, 0
-                ] = X  # \frac{\partial  [\alpha X - \beta XY]}{\partial \alpha}, and so on...
+                ret[0, 0] = (
+                    X  # \frac{\partial  [\alpha X - \beta XY]}{\partial \alpha}, and so on...
+                )
                 ret[0, 1] = -X * Y
                 ret[1, 2] = -Y
                 ret[1, 3] = X * Y
diff --git a/examples/samplers/MLDA_introduction.ipynb b/examples/samplers/MLDA_introduction.ipynb
@@ -63,7 +63,7 @@
     "\n",
     "[Gravity surveying](./MLDA_gravity_surveying.ipynb): In this notebook, we use MLDA to solve a 2-dimensional gravity surveying inverse problem. Evaluating the likelihood requires solving a PDE, which we do using [scipy](https://www.scipy.org/). We also compare the performance of MLDA with other PyMC samplers (Metropolis, DEMetropolisZ).\n",
     "\n",
-    "[Variance reduction 1](./MLDA_variance_reduction_linear_regression.ipynb) and [Variance reduction 2](https://github.com/alan-turing-institute/pymc/blob/mlda_all_notebooks/docs/source/notebooks/MLDA_variance_reduction_groundwater.ipynb) (external link): Those two notebooks demonstrate the variance reduction feature in a linear regression model and a groundwater flow model. This feature allows the user to define a quantity of interest that they need to estimate using the MCMC samples. It then collects those quantities of interest, as well as differences of these quantities between levels, during MLDA sampling. The collected quentities can then be used to produce an estimate which has lower variance than a standard estimate that uses samples from the fine chain only. The first notebook does not have external dependencies, while the second one requires FEniCS. Note that the second notebook is outside the core PyMC repository because FEniCS is not a PyMC dependency.\n",
+    "[Variance reduction 1](./MLDA_variance_reduction_linear_regression.ipynb) and [Variance reduction 2](https://github.com/alan-turing-institute/pymc3/blob/mlda_all_notebooks/docs/source/notebooks/MLDA_variance_reduction_groundwater.ipynb) (external link): Those two notebooks demonstrate the variance reduction feature in a linear regression model and a groundwater flow model. This feature allows the user to define a quantity of interest that they need to estimate using the MCMC samples. It then collects those quantities of interest, as well as differences of these quantities between levels, during MLDA sampling. The collected quentities can then be used to produce an estimate which has lower variance than a standard estimate that uses samples from the fine chain only. The first notebook does not have external dependencies, while the second one requires FEniCS. Note that the second notebook is outside the core PyMC repository because FEniCS is not a PyMC dependency.\n",
     "\n",
     "[Adaptive error model](https://github.com/alan-turing-institute/pymc3/blob/mlda_all_notebooks/docs/source/notebooks/MLDA_adaptive_error_model.ipynb) (external link): In this notebook we use MLDA to tackle another inverse problem; groundwarer flow modeling. The aim is to infer the posterior distribution of model parameters (hydraulic conductivity) given data (measurements of hydraulic head). In this example we make use of PyTensor Ops in order to define a \"black box\" likelihood, i.e. a likelihood that uses external code. Specifically, our likelihood uses the [FEniCS](https://fenicsproject.org/) library to solve a PDE. This is a common scenario, as PDEs of this type are slow to solve with scipy or other standard libraries. Note that this notebook is outside the core PyMC repository because FEniCS is not a PyMC dependency. We employ the adaptive error model (AEM) feature and compare the performance of basic MLDA with AEM-enhanced MLDA. The idea of Adaptive Error Model (AEM) is to estimate the mean and variance of the forward-model error between adjacent levels, i.e. estimate the bias of the coarse forward model compared to the fine forward model, and use those estimates to correct the coarse model. Using the technique should improve ESS/sec on the fine level.\n",
     "\n",
diff --git a/examples/samplers/MLDA_introduction.myst.md b/examples/samplers/MLDA_introduction.myst.md
@@ -57,7 +57,7 @@ Please note that the MLDA sampler is new in PyMC. The user should be extra criti
 
 [Gravity surveying](./MLDA_gravity_surveying.ipynb): In this notebook, we use MLDA to solve a 2-dimensional gravity surveying inverse problem. Evaluating the likelihood requires solving a PDE, which we do using [scipy](https://www.scipy.org/). We also compare the performance of MLDA with other PyMC samplers (Metropolis, DEMetropolisZ).
 
-[Variance reduction 1](./MLDA_variance_reduction_linear_regression.ipynb) and [Variance reduction 2](https://github.com/alan-turing-institute/pymc/blob/mlda_all_notebooks/docs/source/notebooks/MLDA_variance_reduction_groundwater.ipynb) (external link): Those two notebooks demonstrate the variance reduction feature in a linear regression model and a groundwater flow model. This feature allows the user to define a quantity of interest that they need to estimate using the MCMC samples. It then collects those quantities of interest, as well as differences of these quantities between levels, during MLDA sampling. The collected quentities can then be used to produce an estimate which has lower variance than a standard estimate that uses samples from the fine chain only. The first notebook does not have external dependencies, while the second one requires FEniCS. Note that the second notebook is outside the core PyMC repository because FEniCS is not a PyMC dependency.
+[Variance reduction 1](./MLDA_variance_reduction_linear_regression.ipynb) and [Variance reduction 2](https://github.com/alan-turing-institute/pymc3/blob/mlda_all_notebooks/docs/source/notebooks/MLDA_variance_reduction_groundwater.ipynb) (external link): Those two notebooks demonstrate the variance reduction feature in a linear regression model and a groundwater flow model. This feature allows the user to define a quantity of interest that they need to estimate using the MCMC samples. It then collects those quantities of interest, as well as differences of these quantities between levels, during MLDA sampling. The collected quentities can then be used to produce an estimate which has lower variance than a standard estimate that uses samples from the fine chain only. The first notebook does not have external dependencies, while the second one requires FEniCS. Note that the second notebook is outside the core PyMC repository because FEniCS is not a PyMC dependency.
 
 [Adaptive error model](https://github.com/alan-turing-institute/pymc3/blob/mlda_all_notebooks/docs/source/notebooks/MLDA_adaptive_error_model.ipynb) (external link): In this notebook we use MLDA to tackle another inverse problem; groundwarer flow modeling. The aim is to infer the posterior distribution of model parameters (hydraulic conductivity) given data (measurements of hydraulic head). In this example we make use of PyTensor Ops in order to define a "black box" likelihood, i.e. a likelihood that uses external code. Specifically, our likelihood uses the [FEniCS](https://fenicsproject.org/) library to solve a PDE. This is a common scenario, as PDEs of this type are slow to solve with scipy or other standard libraries. Note that this notebook is outside the core PyMC repository because FEniCS is not a PyMC dependency. We employ the adaptive error model (AEM) feature and compare the performance of basic MLDA with AEM-enhanced MLDA. The idea of Adaptive Error Model (AEM) is to estimate the mean and variance of the forward-model error between adjacent levels, i.e. estimate the bias of the coarse forward model compared to the fine forward model, and use those estimates to correct the coarse model. Using the technique should improve ESS/sec on the fine level.
 
diff --git a/examples/samplers/samplers_mvnormal.py b/examples/samplers/samplers_mvnormal.py
@@ -6,7 +6,6 @@
 normalized effective sampling rates.
 """
 
-
 import time
 
 import arviz as az
diff --git a/examples/spatial/nyc_bym.ipynb b/examples/spatial/nyc_bym.ipynb
diff --git a/examples/spatial/nyc_bym.myst.md b/examples/spatial/nyc_bym.myst.md
@@ -329,7 +329,7 @@ with pm.Model(coords=coords) as BYM_model:
     theta = pm.Normal("theta", 0, 1, dims="area_idx")
 
     # spatially structured random effect
-    phi = pm.ICAR("phi", W=W_nyc)
+    phi = pm.ICAR("phi", W=W_nyc, dims="area_idx")
 
     # joint variance of random effects
     sigma = pm.HalfNormal("sigma", 1)
@@ -338,11 +338,15 @@ with pm.Model(coords=coords) as BYM_model:
     rho = pm.Beta("rho", 0.5, 0.5)
 
     # the bym component - it mixes a spatial and a random effect
-    mixture = pt.sqrt(1 - rho) * theta + pt.sqrt(rho / scaling_factor) * phi
+    mixture = pm.Deterministic(
+        "mixture", pt.sqrt(1 - rho) * theta + pt.sqrt(rho / scaling_factor) * phi, dims="area_idx"
+    )
 
     # exponential link function to ensure
     # predictions are positive
-    mu = pt.exp(log_E + beta0 + beta1 * fragment_index + sigma * mixture)
+    mu = pm.Deterministic(
+        "mu", pt.exp(log_E + beta0 + beta1 * fragment_index + sigma * mixture), dims="area_idx"
+    )
 
     y_i = pm.Poisson("y_i", mu, observed=y)
 ```
@@ -361,7 +365,7 @@ with BYM_model:
 We can evaluate the sampler in several ways. First, it looks like all our chains converged. All parameters have rhat values very close to one.
 
 ```{code-cell} ipython3
-rhat = az.summary(idata).r_hat.values
+rhat = az.summary(idata, kind="diagnostics").r_hat.values
 sum(rhat > 1.03)
 ```
 
@@ -380,29 +384,33 @@ Our trace plot also indicates there is a small effect of social fragmentation on
 
 The payoff of all this work is that we can now visualize what it means to decompose the variance into explanatory, spatial and unstructured parts. One way to make this vivid is to inspect each component of the model individually.  We'll see what the model thinks NYC should look like if spatial effects were the only source of variance, then we'll turn to the explanatory effect and finally the random effect.
 
-We'll extract the means of several parameters to generate predictions. In the first case, we'll visualize only the predictions that come from the spatial component of the model. In other words, we are assuming $\rho = 1$ and we ignore $\theta$ and social fragmentation.
+In the first case, we'll visualize only the predictions that come from the spatial component of the model. In other words, we are assuming $\rho = 1$ and we ignore $\theta$ and social fragmentation. 
 
-```{code-cell} ipython3
-phi_pred = idata.posterior.phi.mean(("chain", "draw")).values
-beta0_pred = idata.posterior.beta0.mean(("chain", "draw")).values
-sigma_pred = idata.posterior.sigma.mean(("chain", "draw")).values
-y_predict = np.exp(log_E + beta0_pred + sigma_pred * (1 / scaling_factor) * phi_pred)
-```
++++
 
 Then we'll overlay our predictions onto the same {ref}`adjacency map we built earlier <adjacency-map>`. 
 
 ```{code-cell} ipython3
+# draw posterio
+
+with pm.do(BYM_model, {"rho": 1.0, "beta1": 0}):
+    y_predict = pm.sample_posterior_predictive(
+        idata, var_names=["mu", "mixture"], predictions=True, extend_inferencedata=False
+    )
+
+y_spatial_pred = y_predict.predictions.mu.mean(dim=["chain", "draw"]).values
+
 plt.figure(figsize=(10, 8))
 nx.draw_networkx(
     G_nyc,
     pos=pos,
-    node_color=y_predict,
+    node_color=y_spatial_pred,
     cmap="plasma",
     vmax=30,
     width=0.5,
     alpha=0.6,
     with_labels=False,
-    node_size=20 + 3 * y_predict,
+    node_size=20 + 3 * y_spatial_pred,
 )
 ```
 
@@ -413,40 +421,53 @@ Spatial smoothing is especially useful for forecasting. Imagine there was a low-
 We can notice that there are three neighborhoods of risk, represented by large yellow clusters, that are well-captured. This suggests that a lot of the explanation for traffic accidents has to do with unidentified but spatially structured causes. By contrast, the social fragmentation index only explains a single neighborhood of risk in the bottom center of the map (with a few small pockets of success elsewhere).
 
 ```{code-cell} ipython3
-beta1_pred = idata.posterior.beta1.mean(("chain", "draw")).values
-y_predict = np.exp(log_E + beta0_pred + beta1_pred * fragment_index)
+with pm.do(
+    BYM_model,
+    {
+        "sigma": 0.0,
+    },
+):
+    y_predict = pm.sample_posterior_predictive(
+        idata, var_names=["mu", "mixture"], predictions=True, extend_inferencedata=False
+    )
+
+y_frag_pred = y_predict.predictions.mu.mean(dim=["chain", "draw"]).values
 
 plt.figure(figsize=(10, 8))
 nx.draw_networkx(
     G_nyc,
     pos=pos,
-    node_color=y_predict,
+    node_color=y_frag_pred,
     cmap="plasma",
     vmax=30,
     width=0.5,
     alpha=0.6,
     with_labels=False,
-    node_size=20 + 3 * y_predict,
+    node_size=20 + 3 * y_frag_pred,
 )
 ```
 
 Finally, we might look at the unstructured variance by assuming $\rho = 0$. If our model managed to partition variance successfully, there should not be too many spatial clusters left over in the unstructured variance. Instead, variance should be scattered all over the map.
 
 ```{code-cell} ipython3
-theta_pred = idata.posterior.theta.mean(("chain", "draw")).values
-y_predict = np.exp(log_E + beta0_pred + sigma_pred * theta_pred)
+with pm.do(BYM_model, {"rho": 0.0, "beta1": 0}):
+    y_predict = pm.sample_posterior_predictive(
+        idata, var_names=["mu", "mixture"], predictions=True, extend_inferencedata=False
+    )
+
+y_unspatial_pred = y_predict.predictions.mu.mean(dim=["chain", "draw"]).values
 
 plt.figure(figsize=(10, 8))
 nx.draw_networkx(
     G_nyc,
     pos=pos,
-    node_color=y_predict,
+    node_color=y_unspatial_pred,
     cmap="plasma",
     vmax=30,
     width=0.5,
     alpha=0.6,
     with_labels=False,
-    node_size=20 + 3 * y_predict,
+    node_size=20 + 3 * y_unspatial_pred,
 )
 ```
 
diff --git a/scripts/rerun.py b/scripts/rerun.py
@@ -14,6 +14,7 @@
 python scripts/rerun.py --fp_notebook=examples/case_studies/BEST.ipynb --commit_to=rerun-best --push_to=mine
 ```
 """
+
 import argparse
 import logging
 import pathlib
diff --git a/sphinxext/thumbnail_extractor.py b/sphinxext/thumbnail_extractor.py
@@ -3,6 +3,7 @@
 
 Modified from the seaborn project, which modified the mpld3 project.
 """
+
 import base64
 import json
 import os