Skip to content

Commit 4a8148f

Browse files
committed
running pre-commit checks
Signed-off-by: Nathaniel <[email protected]>
1 parent dde7965 commit 4a8148f

File tree

2 files changed

+48
-48
lines changed

2 files changed

+48
-48
lines changed

examples/generalized_linear_models/GLM-discrete-choice_models.ipynb

Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4994,7 +4994,7 @@
49944994
" u1 = alphas[1] + beta_ic * wide_heating_df[\"ic.er\"] + beta_oc * wide_heating_df[\"oc.er\"]\n",
49954995
" u2 = alphas[2] + beta_ic * wide_heating_df[\"ic.gc\"] + beta_oc * wide_heating_df[\"oc.gc\"]\n",
49964996
" u3 = alphas[3] + beta_ic * wide_heating_df[\"ic.gr\"] + beta_oc * wide_heating_df[\"oc.gr\"]\n",
4997-
" u4 = 0 + beta_ic * wide_heating_df[\"ic.hp\"] + beta_oc * wide_heating_df[\"oc.hp\"]\n",
4997+
" u4 = 0 + beta_ic * wide_heating_df[\"ic.hp\"] + beta_oc * wide_heating_df[\"oc.hp\"]\n",
49984998
" s = pm.math.stack([u0, u1, u2, u3, u4]).T\n",
49994999
"\n",
50005000
" ## Apply Softmax Transform\n",
@@ -5634,17 +5634,11 @@
56345634
"\n",
56355635
" u0 = alphas[0] + beta_ic * ic_ec + beta_oc * oc_ec\n",
56365636
" u1 = alphas[1] + beta_ic * ic_er + beta_oc * oc_er\n",
5637-
" u2 = (\n",
5638-
" alphas[2]\n",
5639-
" + beta_ic * wide_heating_df[\"ic.gc\"]\n",
5640-
" + beta_oc * wide_heating_df[\"oc.gc\"]\n",
5641-
" )\n",
5642-
" u3 = (\n",
5643-
" alphas[3]\n",
5644-
" + beta_ic * wide_heating_df[\"ic.gr\"]\n",
5645-
" + beta_oc * wide_heating_df[\"oc.gr\"]\n",
5646-
" )\n",
5647-
" u4 = (alphas[4] + beta_ic * wide_heating_df[\"ic.hp\"] + beta_oc * wide_heating_df[\"oc.hp\"]) # pivot)\n",
5637+
" u2 = alphas[2] + beta_ic * wide_heating_df[\"ic.gc\"] + beta_oc * wide_heating_df[\"oc.gc\"]\n",
5638+
" u3 = alphas[3] + beta_ic * wide_heating_df[\"ic.gr\"] + beta_oc * wide_heating_df[\"oc.gr\"]\n",
5639+
" u4 = (\n",
5640+
" alphas[4] + beta_ic * wide_heating_df[\"ic.hp\"] + beta_oc * wide_heating_df[\"oc.hp\"]\n",
5641+
" ) # pivot)\n",
56485642
" s = pm.math.stack([u0, u1, u2, u3, u4]).T\n",
56495643
"\n",
56505644
" p_ = pm.Deterministic(\"p\", pm.math.softmax(s, axis=1), dims=(\"obs\", \"alts_probs\"))\n",
@@ -6243,9 +6237,7 @@
62436237
}
62446238
],
62456239
"source": [
6246-
"az.summary(\n",
6247-
" idata_m3, var_names=[\"beta_ic\", \"beta_oc\", \"alpha\", \"chol_corr\"], round_to=4\n",
6248-
")"
6240+
"az.summary(idata_m3, var_names=[\"beta_ic\", \"beta_oc\", \"alpha\", \"chol_corr\"], round_to=4)"
62496241
]
62506242
},
62516243
{
@@ -8146,7 +8138,7 @@
81468138
],
81478139
"source": [
81488140
"# Old Policy Expectations\n",
8149-
"old = idata_m3['posterior']['p'].mean(dim=[\"chain\", \"draw\", \"obs\"])\n",
8141+
"old = idata_m3[\"posterior\"][\"p\"].mean(dim=[\"chain\", \"draw\", \"obs\"])\n",
81508142
"old"
81518143
]
81528144
},

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md

Lines changed: 40 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ This assumption proves to be mathematically convenient because the difference be
9898

9999
$$ \text{softmax}(u)_{j} = \frac{\exp(u_{j})}{\sum_{q=1}^{J}\exp(u_{q})} $$
100100

101-
The model then assumes that decision maker chooses the option that maximises their subjective utility. The individual utility functions can be richly parameterised. The model is identified just when the utility measures of the alternatives are benchmarked against the fixed utility of the "outside good." The last quantity is fixed at 0.
101+
The model then assumes that decision maker chooses the option that maximises their subjective utility. The individual utility functions can be richly parameterised. The model is identified just when the utility measures of the alternatives are benchmarked against the fixed utility of the "outside good." The last quantity is often fixed at 0 to aid parameter identification on alternative-specific parameters as we'll see below.
102102

103103
$$\begin{pmatrix}
104104
u_{gc} \\
@@ -131,7 +131,7 @@ long_heating_df[long_heating_df["idcase"] == 1][columns]
131131

132132
## The Basic Model
133133

134-
We will show here how to incorporate the utility specifications in PyMC. PyMC is a nice interface for this kind of modelling because it can express the model quite cleanly following the natural mathematical expression for this system of equations. You can see in this simple model how we go about constructing equations for the utility measure of each alternative separately, and then stacking them together to create the input matrix for our softmax transform.
134+
We will show here how to incorporate the utility specifications in PyMC. PyMC is a nice interface for this kind of modelling because it can express the model quite cleanly following the natural mathematical expression for this system of equations. You can see in this simple model how we go about constructing equations for the utility measure of each alternative seperately, and then stacking them together to create the input matrix for our softmax transform.
135135

136136
```{code-cell} ipython3
137137
N = wide_heating_df.shape[0]
@@ -150,7 +150,7 @@ with pm.Model(coords=coords) as model_1:
150150
u1 = beta_ic * wide_heating_df["ic.er"] + beta_oc * wide_heating_df["oc.er"]
151151
u2 = beta_ic * wide_heating_df["ic.gc"] + beta_oc * wide_heating_df["oc.gc"]
152152
u3 = beta_ic * wide_heating_df["ic.gr"] + beta_oc * wide_heating_df["oc.gr"]
153-
u4 = np.zeros(N) # Outside Good
153+
u4 = beta_ic * wide_heating_df["ic.hp"] + beta_oc * wide_heating_df["oc.hp"]
154154
s = pm.math.stack([u0, u1, u2, u3, u4]).T
155155
156156
## Apply Softmax Transform
@@ -210,7 +210,7 @@ ax.hist(
210210
ax.set_title("Uncertainty in Marginal Rate of Substitution \n Operating Costs / Installation Costs");
211211
```
212212

213-
which suggests that there is almost twice the value accorded to the a unit reduction in recurring operating costs over the one-off installation costs. Whether this is remotely plausible is almost beside the point since the model does not even closely capture the data generating process. But it's worth repeating that the native scale of utility is not straightforwardly meaningful, but the ratio of the coefficients in the utility equations can be directly interpreted.
213+
which suggests that there is around .7 of the value accorded to the a unit reduction in recurring operating costs over the one-off installation costs. Whether this is remotely plausible is almost beside the point since the model does not even closely capture the data generating process. But it's worth repeating that the native scale of utility is not straightforwardly meaningful, but the ratio of the coefficients in the utility equations can be directly interpreted.
214214

215215
To assess overall model adequacy we can rely on the posterior predictive checks to see if the model can recover an approximation to the data generating process.
216216

@@ -272,7 +272,7 @@ with pm.Model(coords=coords) as model_2:
272272
u1 = alphas[1] + beta_ic * wide_heating_df["ic.er"] + beta_oc * wide_heating_df["oc.er"]
273273
u2 = alphas[2] + beta_ic * wide_heating_df["ic.gc"] + beta_oc * wide_heating_df["oc.gc"]
274274
u3 = alphas[3] + beta_ic * wide_heating_df["ic.gr"] + beta_oc * wide_heating_df["oc.gr"]
275-
u4 = np.zeros(N) # Outside Good
275+
u4 = 0 + beta_ic * wide_heating_df["ic.hp"] + beta_oc * wide_heating_df["oc.hp"]
276276
s = pm.math.stack([u0, u1, u2, u3, u4]).T
277277
278278
## Apply Softmax Transform
@@ -291,8 +291,10 @@ with pm.Model(coords=coords) as model_2:
291291
pm.model_to_graphviz(model_2)
292292
```
293293

294+
We have deliberately 0'd out the intercept parameter for the `hp` alternative to ensure parameter identification is feasible.
295+
294296
```{code-cell} ipython3
295-
az.summary(idata_m2, var_names=["beta_ic", "beta_oc", "alpha"])
297+
az.summary(idata_m2, var_names=["beta_ic", "beta_oc", "alpha"], round_to=4)
296298
```
297299

298300
We can see now how this model performs much better in capturing aspects of the data generating process.
@@ -348,27 +350,18 @@ with pm.Model(coords=coords) as model_3:
348350
349351
beta_ic = pm.Normal("beta_ic", 0, 1)
350352
beta_oc = pm.Normal("beta_oc", 0, 1)
351-
beta_income = pm.Normal("beta_income", 0, 1, dims="alts_intercepts")
352353
chol, corr, stds = pm.LKJCholeskyCov(
353-
"chol", n=4, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=4)
354+
"chol", n=5, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=5)
354355
)
355-
alphas = pm.MvNormal("alpha", mu=0, chol=chol, dims="alts_intercepts")
356+
alphas = pm.MvNormal("alpha", mu=0, chol=chol, dims="alts_probs")
356357
357-
u0 = alphas[0] + beta_ic * ic_ec + beta_oc * oc_ec + beta_income[0] * wide_heating_df["income"]
358-
u1 = alphas[1] + beta_ic * ic_er + beta_oc * oc_er + beta_income[1] * wide_heating_df["income"]
359-
u2 = (
360-
alphas[2]
361-
+ beta_ic * wide_heating_df["ic.gc"]
362-
+ beta_oc * wide_heating_df["oc.gc"]
363-
+ beta_income[2] * wide_heating_df["income"]
364-
)
365-
u3 = (
366-
alphas[3]
367-
+ beta_ic * wide_heating_df["ic.gr"]
368-
+ beta_oc * wide_heating_df["oc.gr"]
369-
+ beta_income[3] * wide_heating_df["income"]
370-
)
371-
u4 = np.zeros(N) # pivot
358+
u0 = alphas[0] + beta_ic * ic_ec + beta_oc * oc_ec
359+
u1 = alphas[1] + beta_ic * ic_er + beta_oc * oc_er
360+
u2 = alphas[2] + beta_ic * wide_heating_df["ic.gc"] + beta_oc * wide_heating_df["oc.gc"]
361+
u3 = alphas[3] + beta_ic * wide_heating_df["ic.gr"] + beta_oc * wide_heating_df["oc.gr"]
362+
u4 = (
363+
alphas[4] + beta_ic * wide_heating_df["ic.hp"] + beta_oc * wide_heating_df["oc.hp"]
364+
) # pivot)
372365
s = pm.math.stack([u0, u1, u2, u3, u4]).T
373366
374367
p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))
@@ -417,9 +410,7 @@ ax.set_ylabel("Heating System");
417410
That extra complexity can be informative, and the degree of relationship amongst the alternative products will inform the substitution patterns under policy changes. Also, note how under this model specification the parameter for `beta_ic` has a expected value of 0. Suggestive perhaps of a resignation towards the reality of installation costs that doesn't change the utility metric one way or other after a decision to purchase.
418411

419412
```{code-cell} ipython3
420-
az.summary(
421-
idata_m3, var_names=["beta_income", "beta_ic", "beta_oc", "alpha", "chol_corr"], round_to=4
422-
)
413+
az.summary(idata_m3, var_names=["beta_ic", "beta_oc", "alpha", "chol_corr"], round_to=4)
423414
```
424415

425416
In this model we see that the marginal rate of substitution shows that an increase of one dollar for the operating costs is almost 17 times more impactful on the utility calculus than a similar increase in installation costs. Which makes sense in so far as we can expect the installation costs to be a one-off expense we're pretty resigned to.
@@ -454,7 +445,19 @@ idata_new_policy
454445
```
455446

456447
```{code-cell} ipython3
457-
idata_new_policy["predictions"]["p"].mean(dim=["chain", "draw", "obs"])
448+
# Old Policy Expectations
449+
old = idata_m3["posterior"]["p"].mean(dim=["chain", "draw", "obs"])
450+
old
451+
```
452+
453+
```{code-cell} ipython3
454+
# New Policy Expectations
455+
new = idata_new_policy["predictions"]["p"].mean(dim=["chain", "draw", "obs"])
456+
new
457+
```
458+
459+
```{code-cell} ipython3
460+
new - old
458461
```
459462

460463
```{code-cell} ipython3
@@ -497,15 +500,15 @@ Here we can, as expected, see that a rise in the operating costs of the electric
497500

498501
### Compare Models
499502

500-
We'll now evaluate all three model fits on their predictive performance. Predictive performance on the original data is a good benchmark that the model has appropriately captured the data generating process. But it is not (as we've seen) the only feature of interest in these models. These models are sensitive to our theoretical beliefs about the agents making the decisions, the view of the decision process and the elements of the choice scenario.
503+
We'll now evaluate all three model fits on their predictive performance. Predictive performance on the original data is a good benchmark that the model has appropriately captured the data generating process. But it is not (as we've seen) the only feature of interest in these models. These models are sensetive to our theoretical beliefs about the agents making the decisions, the view of the decision process and the elements of the choice scenario.
501504

502505
```{code-cell} ipython3
503506
compare = az.compare({"m1": idata_m1, "m2": idata_m2, "m3": idata_m3})
504507
compare
505508
```
506509

507510
```{code-cell} ipython3
508-
az.plot_compare(compare)
511+
az.plot_compare(compare);
509512
```
510513

511514
## Choosing Crackers over Repeated Choices: Mixed Logit Model
@@ -573,7 +576,7 @@ observed = pd.Categorical(c_df["choice"]).codes
573576
person_indx, uniques = pd.factorize(c_df["personId"])
574577
575578
coords = {
576-
"alts_intercepts": ["sunshine", "keebler", "nabisco"],
579+
"alts_intercepts": ["sunshine", "keebler", "nabisco", "private"],
577580
"alts_probs": ["sunshine", "keebler", "nabisco", "private"],
578581
"individuals": uniques,
579582
"obs": range(N),
@@ -604,7 +607,12 @@ with pm.Model(coords=coords) as model_4:
604607
+ beta_feat * c_df["feat.nabisco"]
605608
+ beta_price * c_df["price.nabisco"]
606609
)
607-
u3 = np.zeros(N) # Outside Good
610+
u3 = (
611+
(alphas[3] + beta_individual[person_indx, 2])
612+
+ beta_disp * c_df["disp.private"]
613+
+ beta_feat * c_df["feat.private"]
614+
+ beta_price * c_df["price.private"]
615+
)
608616
s = pm.math.stack([u0, u1, u2, u3]).T
609617
# Reconstruct the total data
610618

0 commit comments

Comments
 (0)