Skip to content

Commit 88d9858

Browse files
Jonathan DekermanjianJonathan Dekermanjian
authored andcommitted
replaced all pymc potential with pymc censored
1 parent fc03395 commit 88d9858

File tree

2 files changed

+367
-306
lines changed

2 files changed

+367
-306
lines changed

examples/survival_analysis/weibull_aft.ipynb

Lines changed: 318 additions & 280 deletions
Large diffs are not rendered by default.

examples/survival_analysis/weibull_aft.myst.md

Lines changed: 49 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ kernelspec:
88
display_name: pymc
99
language: python
1010
name: python3
11+
myst:
12+
substitutions:
13+
extra_dependencies: statsmodels
1114
---
1215

1316
(weibull_aft)=
@@ -25,11 +28,18 @@ import arviz as az
2528
import numpy as np
2629
import pymc as pm
2730
import pytensor.tensor as pt
28-
import statsmodels.api as sm
2931
3032
print(f"Running on PyMC v{pm.__version__}")
3133
```
3234

35+
:::{include} ../extra_installs.md
36+
:::
37+
38+
```{code-cell} ipython3
39+
# These dependencies need to be installed separately from PyMC
40+
import statsmodels.api as sm
41+
```
42+
3343
```{code-cell} ipython3
3444
%config InlineBackend.figure_format = 'retina'
3545
RANDOM_SEED = 8927
@@ -71,7 +81,9 @@ censored[:5]
7181

7282
We have an unique problem when modelling censored data. Strictly speaking, we don't have any _data_ for censored values: we only know the _number_ of values that were censored. How can we include this information in our model?
7383

74-
One way do this is by making use of `pm.Potential`. The [PyMC2 docs](https://pymc-devs.github.io/pymc/modelbuilding.html#the-potential-class) explain its usage very well. Essentially, declaring `pm.Potential('x', logp)` will add `logp` to the log-likelihood of the model.
84+
One way do this is by making use of `pm.Potential`. The [PyMC2 docs](https://pymc-devs.github.io/pymc/modelbuilding.html#the-potential-class) explain its usage very well. Essentially, declaring `pm.Potential('x', logp)` will add `logp` to the log-likelihood of the model.
85+
86+
However, `pm.Potential` only effect probability based sampling this excludes using `pm.sample_prior_predictice` and `pm.sample_posterior_predictive`. We can overcome these limitations by using `pm.Censored` instead. We can model our right-censored data by defining the `upper` argument of `pm.Censored`.
7587

7688
+++
7789

@@ -80,36 +92,40 @@ One way do this is by making use of `pm.Potential`. The [PyMC2 docs](https://pym
8092
This parameterization is an intuitive, straightforward parameterization of the Weibull survival function. This is probably the first parameterization to come to one's mind.
8193

8294
```{code-cell} ipython3
83-
def weibull_lccdf(x, alpha, beta):
84-
"""Log complementary cdf of Weibull distribution."""
85-
return -((x / beta) ** alpha)
95+
# normalize the event time between 0 and 1
96+
y_norm = y / np.max(y)
97+
```
98+
99+
```{code-cell} ipython3
100+
# If censored then observed event time else maximum time
101+
right_censored = [x if x > 0 else np.max(y_norm) for x in y_norm * censored]
86102
```
87103

88104
```{code-cell} ipython3
89105
with pm.Model() as model_1:
90-
alpha_sd = 10.0
106+
alpha_sd = 1.0
91107
92-
mu = pm.Normal("mu", mu=0, sigma=100)
108+
mu = pm.Normal("mu", mu=0, sigma=1)
93109
alpha_raw = pm.Normal("a0", mu=0, sigma=0.1)
94110
alpha = pm.Deterministic("alpha", pt.exp(alpha_sd * alpha_raw))
95111
beta = pm.Deterministic("beta", pt.exp(mu / alpha))
112+
beta_backtransformed = pm.Deterministic("beta_backtransformed", beta * np.max(y))
96113
97-
y_obs = pm.Weibull("y_obs", alpha=alpha, beta=beta, observed=y[~censored])
98-
y_cens = pm.Potential("y_cens", weibull_lccdf(y[censored], alpha, beta))
114+
latent = pm.Weibull.dist(alpha=alpha, beta=beta)
115+
y_obs = pm.Censored("Censored_likelihood", latent, upper=right_censored, observed=y_norm)
99116
```
100117

101118
```{code-cell} ipython3
102119
with model_1:
103-
# Change init to avoid divergences
104-
data_1 = pm.sample(target_accept=0.9, init="adapt_diag")
120+
idata_param1 = pm.sample(nuts_sampler="numpyro")
105121
```
106122

107123
```{code-cell} ipython3
108-
az.plot_trace(data_1, var_names=["alpha", "beta"])
124+
az.plot_trace(idata_param1, var_names=["alpha", "beta"])
109125
```
110126

111127
```{code-cell} ipython3
112-
az.summary(data_1, var_names=["alpha", "beta"], round_to=2)
128+
az.summary(idata_param1, var_names=["alpha", "beta", "beta_backtransformed"], round_to=2)
113129
```
114130

115131
## Parameterization 2
@@ -120,26 +136,27 @@ For more information, see [this Stan example model](https://github.com/stan-dev/
120136

121137
```{code-cell} ipython3
122138
with pm.Model() as model_2:
123-
alpha = pm.Normal("alpha", mu=0, sigma=10)
124-
r = pm.Gamma("r", alpha=1, beta=0.001, testval=0.25)
139+
alpha = pm.Normal("alpha", mu=0, sigma=1)
140+
r = pm.Gamma("r", alpha=2, beta=1)
125141
beta = pm.Deterministic("beta", pt.exp(-alpha / r))
142+
beta_backtransformed = pm.Deterministic("beta_backtransformed", beta * np.max(y))
126143
127-
y_obs = pm.Weibull("y_obs", alpha=r, beta=beta, observed=y[~censored])
128-
y_cens = pm.Potential("y_cens", weibull_lccdf(y[censored], r, beta))
144+
latent = pm.Weibull.dist(alpha=r, beta=beta)
145+
y_obs = pm.Censored("Censored_likelihood", latent, upper=right_censored, observed=y_norm)
129146
```
130147

131148
```{code-cell} ipython3
132149
with model_2:
133150
# Increase target_accept to avoid divergences
134-
data_2 = pm.sample(target_accept=0.9)
151+
idata_param2 = pm.sample(nuts_sampler="numpyro")
135152
```
136153

137154
```{code-cell} ipython3
138-
az.plot_trace(data_2, var_names=["r", "beta"])
155+
az.plot_trace(idata_param2, var_names=["r", "beta"])
139156
```
140157

141158
```{code-cell} ipython3
142-
az.summary(data_2, var_names=["r", "beta"], round_to=2)
159+
az.summary(idata_param2, var_names=["r", "beta", "beta_backtransformed"], round_to=2)
143160
```
144161

145162
## Parameterization 3
@@ -155,34 +172,40 @@ def gumbel_sf(y, mu, sigma):
155172
return 1.0 - pt.exp(-pt.exp(-(y - mu) / sigma))
156173
```
157174

175+
```{code-cell} ipython3
176+
# If censored then observed event time else maximum time
177+
right_censored = [x if x > 0 else np.max(logtime) for x in logtime * censored]
178+
```
179+
158180
```{code-cell} ipython3
159181
with pm.Model() as model_3:
160-
s = pm.HalfNormal("s", tau=5.0)
182+
s = pm.HalfNormal("s", tau=3.0)
161183
gamma = pm.Normal("gamma", mu=0, sigma=5)
162184
163-
y_obs = pm.Gumbel("y_obs", mu=gamma, beta=s, observed=logtime[~censored])
164-
y_cens = pm.Potential("y_cens", gumbel_sf(y=logtime[censored], mu=gamma, sigma=s))
185+
latent = pm.Gumbel.dist(mu=gamma, beta=s)
186+
y_obs = pm.Censored("Censored_likelihood", latent, upper=right_censored, observed=logtime)
165187
```
166188

167189
```{code-cell} ipython3
168190
with model_3:
169191
# Change init to avoid divergences
170-
data_3 = pm.sample(init="adapt_diag")
192+
idata_param3 = pm.sample(tune=4000, draws=2000, nuts_sampler="numpyro")
171193
```
172194

173195
```{code-cell} ipython3
174-
az.plot_trace(data_3)
196+
az.plot_trace(idata_param3)
175197
```
176198

177199
```{code-cell} ipython3
178-
az.summary(data_3, round_to=2)
200+
az.summary(idata_param3, round_to=2)
179201
```
180202

181203
## Authors
182204

183205
- Originally collated by [Junpeng Lao](https://junpenglao.xyz/) on Apr 21, 2018. See original code [here](https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/65447fdb431c78b15fbeaef51b8c059f46c9e8d6/PyMC3QnA/discourse_1107.ipynb).
184206
- Authored and ported to Jupyter notebook by [George Ho](https://eigenfoo.xyz/) on Jul 15, 2018.
185207
- Updated for compatibility with PyMC v5 by Chris Fonnesbeck on Jan 16, 2023.
208+
- Updated to replace `pm.Potential` with `pm.Censored` by Jonathan Dekermanjian on Nov 25, 2024.
186209

187210
```{code-cell} ipython3
188211
%load_ext watermark

0 commit comments

Comments
 (0)