Skip to content

Commit 1802a25

Browse files
committed
[DOC/CLN] More cleaning of prob documentation
1 parent 60092df commit 1802a25

File tree

9 files changed

+283
-198
lines changed

9 files changed

+283
-198
lines changed
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
First example of inference
2-
==========================
1+
1 - First example of inference
2+
==============================

examples/2-basic_geology/1-prob_density_transformation/__init__.py

Whitespace-only changes.

examples/2-basic_geology/1-thickness_problem.py

Lines changed: 107 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -2,62 +2,74 @@
22
2.1 - Only Pyro
33
===============
44
5+
6+
Model definition
7+
----------------
8+
9+
Same problem as before, let’s assume the observations are layer
10+
thickness measurements taken on an outcrop. Now, in the previous example
11+
we chose a prior for the mean arbitrarily:
12+
:math:`𝜇∼Normal(mu=10.0, sigma=5.0)`–something that made sense for these
13+
specific data set. If we do not have any further information, keeping
14+
an uninformative prior and let the data to dictate the final value of
15+
the inference is the sensible way forward. However, this also enable to
16+
add information to the system by setting informative priors.
17+
18+
Imagine we get a borehole with the tops of the two interfaces of
19+
interest. Each of this data point will be a random variable itself since
20+
the accuracy of the exact 3D location will be always limited. Notice
21+
that this two data points refer to depth not to thickness–the unit of
22+
the rest of the observations. Therefore, the first step would be to
23+
perform a transformation of the parameters into the observations space.
24+
Naturally in this example a simple subtraction will suffice.
25+
26+
Now we can define the probabilistic models:
27+
528
"""
6-
import os
29+
# sphinx_gallery_thumbnail_number = -2
730

8-
import arviz as az
9-
# Importing auxiliary libraries
31+
# %%
32+
# Importing Necessary Libraries
33+
# -----------------------------
34+
35+
import os
1036
import matplotlib.pyplot as plt
1137
import pyro
1238
import pyro.distributions as dist
1339
import torch
1440
from pyro.infer import MCMC, NUTS, Predictive
1541
from pyro.infer.inspect import get_dependencies
16-
1742
from gempy_probability.plot_posterior import PlotPosterior, default_red, default_blue
43+
import arviz as az
1844

1945
# %%
20-
# Model definition
21-
# ----------------
22-
#
23-
# Same problem as before, let’s assume the observations are layer
24-
# thickness measurements taken on an outcrop. Now, in the previous example
25-
# we chose a prior for the mean arbitrarily:
26-
# :math:`𝜇∼Normal(mu=10.0, sigma=5.0)`–something that made sense for these
27-
# specific data set. If we do not have any further information, keeping
28-
# an uninformative prior and let the data to dictate the final value of
29-
# the inference is the sensible way forward. However, this also enable to
30-
# add information to the system by setting informative priors.
31-
#
32-
# Imagine we get a borehole with the tops of the two interfaces of
33-
# interest. Each of this data point will be a random variable itself since
34-
# the accuracy of the exact 3D location will be always limited. Notice
35-
# that this two data points refer to depth not to thickness–the unit of
36-
# the rest of the observations. Therefore, the first step would be to
37-
# perform a transformation of the parameters into the observations space.
38-
# Naturally in this example a simple subtraction will suffice.
39-
#
40-
# Now we can define the probabilistic models:
41-
#
46+
# Introduction to the Problem
47+
# ---------------------------
48+
# In this example, we are considering layer thickness measurements taken on an outcrop as our observations.
49+
# We use a probabilistic approach to model these observations, allowing for the inclusion of prior knowledge
50+
# and uncertainty quantification.
4251

43-
# %%
44-
# This is to make it work in sphinx gallery
52+
# Setting the working directory for sphinx gallery
4553
cwd = os.getcwd()
46-
if not 'examples' in cwd:
54+
if 'examples' not in cwd:
4755
path_dir = os.getcwd() + '/examples/tutorials/ch5_probabilistic_modeling'
4856
else:
4957
path_dir = cwd
5058

5159
# %%
60+
# Defining the Observations and Model
61+
# -----------------------------------
62+
# The observations are layer thickness measurements. We define a Pyro probabilistic model
63+
# that uses Normal and Gamma distributions to model the top and bottom interfaces of the layers
64+
# and their respective uncertainties.
5265

66+
# Defining observed data
5367
y_obs = torch.tensor([2.12])
54-
y_obs_list = torch.tensor([2.12, 2.06, 2.08, 2.05, 2.08, 2.09,
55-
2.19, 2.07, 2.16, 2.11, 2.13, 1.92])
68+
y_obs_list = torch.tensor([2.12, 2.06, 2.08, 2.05, 2.08, 2.09, 2.19, 2.07, 2.16, 2.11, 2.13, 1.92])
5669
pyro.set_rng_seed(4003)
5770

58-
71+
# Defining the probabilistic model
5972
def model(y_obs_list_):
60-
# Pyro models use the 'sample' function to define random variables
6173
mu_top = pyro.sample(r'$\mu_{top}$', dist.Normal(3.05, 0.2))
6274
sigma_top = pyro.sample(r"$\sigma_{top}$", dist.Gamma(0.3, 3.0))
6375
y_top = pyro.sample(r"y_{top}", dist.Normal(mu_top, sigma_top), obs=torch.tensor([3.02]))
@@ -66,90 +78,122 @@ def model(y_obs_list_):
6678
sigma_bottom = pyro.sample(r'$\sigma_{bottom}$', dist.Gamma(0.3, 3.0))
6779
y_bottom = pyro.sample(r'y_{bottom}', dist.Normal(mu_bottom, sigma_bottom), obs=torch.tensor([1.02]))
6880

69-
mu_thickness = pyro.deterministic(r'$\mu_{thickness}$', mu_top - mu_bottom) # Deterministic transformation
81+
mu_thickness = pyro.deterministic(r'$\mu_{thickness}$', mu_top - mu_bottom)
7082
sigma_thickness = pyro.sample(r'$\sigma_{thickness}$', dist.Gamma(0.3, 3.0))
7183
y_thickness = pyro.sample(r'y_{thickness}', dist.Normal(mu_thickness, sigma_thickness), obs=y_obs_list_)
7284

85+
# Exploring model dependencies
86+
dependencies = get_dependencies(model, model_args=y_obs_list[:1])
7387

74-
dependencies = get_dependencies(
75-
model,
76-
model_args=y_obs_list[:1]
77-
)
88+
# %%
89+
# Prior Sampling
90+
# --------------
91+
# Prior sampling is performed to understand the initial distribution of the model parameters
92+
# before considering the observed data.
7893

79-
# 1. Prior Sampling
94+
# Prior sampling
8095
prior = Predictive(model, num_samples=10)(y_obs_list)
8196

82-
# Now you can run MCMC using NUTS to sample from the posterior
97+
# %%
98+
# Running MCMC Sampling
99+
# ---------------------
100+
# Markov Chain Monte Carlo (MCMC) sampling is used to sample from the posterior distribution,
101+
# providing insights into the distribution of model parameters after considering the observed data.
102+
103+
# Running MCMC using the NUTS algorithm
83104
nuts_kernel = NUTS(model)
84105
mcmc = MCMC(nuts_kernel, num_samples=100, warmup_steps=20)
85106
mcmc.run(y_obs_list)
86107

87-
# 3. Sample from Posterior Predictive
108+
# %%
109+
# Posterior Predictive Sampling
110+
# -----------------------------
111+
# After obtaining the posterior samples, we perform posterior predictive sampling.
112+
# This step allows us to make predictions based on the posterior distribution.
113+
114+
# Sampling from the posterior predictive distribution
88115
posterior_samples = mcmc.get_samples()
89116
posterior_predictive = Predictive(model, posterior_samples)(y_obs_list)
90117

91118
# %%
119+
# Visualizing the Results
120+
# -----------------------
121+
# We use ArviZ, a library for exploratory analysis of Bayesian models, to visualize
122+
# the results of our probabilistic model.
123+
124+
# Creating a data object for ArviZ
92125
data = az.from_pyro(
93126
posterior=mcmc,
94127
prior=prior,
95128
posterior_predictive=posterior_predictive
96129
)
97130

98-
data
99-
100-
# %%
101-
131+
# Plotting trace of the sampled parameters
102132
az.plot_trace(data)
103133
plt.show()
104134

105-
# %%
106-
# sphinx_gallery_thumbnail_number = 3
135+
# %%
136+
# Density Plots of Posterior and Prior
137+
# ------------------------------------
138+
# Density plots provide a visual representation of the distribution of the sampled parameters.
139+
# Comparing the posterior and prior distributions allows us to assess the impact of the observed data.
140+
141+
# Plotting density of posterior and prior distributions
107142
az.plot_density(
108143
data=[data, data.prior],
109144
shade=.9,
110145
data_labels=["Posterior", "Prior"],
111146
colors=[default_red, default_blue],
112147
)
113-
114148
plt.show()
115149

116150
# %%
151+
# Density Plots of Posterior Predictive and Prior Predictive
152+
# ----------------------------------------------------------
153+
# These plots show the distribution of the posterior predictive and prior predictive checks.
154+
# They help in evaluating the performance and validity of the probabilistic model.
155+
156+
# Plotting density of posterior predictive and prior predictive
117157
az.plot_density(
118158
data=[data.posterior_predictive, data.prior_predictive],
119159
shade=.9,
120-
var_names=[
121-
r'$\mu_{thickness}$'
122-
],
160+
var_names=[r'$\mu_{thickness}$'],
123161
data_labels=["Posterior Predictive", "Prior Predictive"],
124162
colors=[default_red, default_blue],
125163
)
126-
127164
plt.show()
128165

129166
# %%
167+
# Marginal Distribution Plots
168+
# ---------------------------
169+
# Marginal distribution plots provide insights into the distribution of individual parameters.
170+
# These plots help in understanding the uncertainty and variability in the parameter estimates.
130171

172+
# Creating marginal distribution plots
131173
p = PlotPosterior(data)
132-
133174
p.create_figure(figsize=(9, 5), joyplot=False, marginal=True, likelihood=False)
134175
p.plot_marginal(
135176
var_names=['$\\mu_{top}$', '$\\mu_{bottom}$'],
136177
plot_trace=False,
137178
credible_interval=.70,
138179
kind='kde',
139-
marginal_kwargs={
140-
"bw": 1
141-
}
180+
marginal_kwargs={"bw": 1}
142181
)
143182
plt.show()
144183

145184
# %%
185+
# Posterior Distribution Visualization
186+
# ------------------------------------
187+
# This section provides a more detailed visualization of the posterior distributions
188+
# of the parameters, integrating different aspects of the probabilistic model.
189+
190+
# Visualizing the posterior distributions
146191
p = PlotPosterior(data)
147192
p.create_figure(figsize=(9, 6), joyplot=True)
148193
iteration = 99
149194
p.plot_posterior(
150195
prior_var=['$\\mu_{top}$', '$\\mu_{bottom}$'],
151196
like_var=['$\\mu_{top}$', '$\\mu_{bottom}$'],
152-
# like_var=['$\\mu_{thickness}$', r"y_{top}"],
153197
obs='y_{thickness}',
154198
iteration=iteration,
155199
marginal_kwargs={
@@ -161,5 +205,11 @@ def model(y_obs_list_):
161205
plt.show()
162206

163207
# %%
208+
# Pair Plot of Key Parameters
209+
# ---------------------------
210+
# Pair plots are useful to visualize the relationships and correlations between different parameters.
211+
# They help in understanding how parameters influence each other in the probabilistic model.
212+
213+
# Creating a pair plot for selected parameters
164214
az.plot_pair(data, divergences=False, var_names=['$\\mu_{top}$', '$\\mu_{bottom}$'])
165215
plt.show()
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Probabilistic modelling with GemPy
2+
``````````````````````````````````
3+
4+
In structural geology, we want to combine different types of data—i.e. geometrical measurements, geophysics, petrochemical data—usually using as a prevailing model a *common earth model*. For the sake of simplicity, in this example, we will combine different type of geometric information into one single probabilistic model. Let’s build on the previous idea in order to extend the conceptual case above back to geological modelling.
5+
6+
Lucky for us, after we perform the first inference on the thickness, :math:`\tilde{y}_{thickness}` of the model, we find out that a colleague has been gathering data at the exact same outcrop but in his case he was recording the location of the top :math:`\tilde{y}_{top}` and bottom :math:`\tilde{y}_{bottom}` interfaces of the layer. We can relate the three data sets with simple algebra:
7+
8+
.. math::
9+
\pi(\theta_{thickness}) = \pi(\theta_{top}) - \pi(\theta_{bottom})
10+
11+
or,
12+
13+
.. math::
14+
\pi(\theta_{bottom}) = \pi(\theta_{top}) - \pi(\theta_{thickness})
15+
16+
now the question is which probabilistic model design is more suitable. In the end this relates directly to the question the model is trying to answer---and possible limitations on the algorithms used---since joint probability follows the commutative and associative properties.
17+

examples/2-basic_geology/2-prob_density_gempy/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)