Skip to content

Commit 434ccfe

Browse files
committed
incorprate reader's feedback
1 parent 39b39c9 commit 434ccfe

File tree

5 files changed

+161
-93
lines changed

5 files changed

+161
-93
lines changed

lectures/divergence_measures.md

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ jupytext:
44
extension: .md
55
format_name: myst
66
format_version: 0.13
7-
jupytext_version: 1.17.1
7+
jupytext_version: 1.16.6
88
kernelspec:
99
display_name: Python 3 (ipykernel)
1010
language: python
@@ -60,10 +60,6 @@ import pandas as pd
6060
from IPython.display import display, Math
6161
```
6262

63-
64-
65-
66-
6763
## Primer on entropy, cross-entropy, KL divergence
6864

6965
Before diving in, we'll introduce some useful concepts in a simple setting.
@@ -163,6 +159,8 @@ f(z; a, b) = \frac{\Gamma(a+b) z^{a-1} (1-z)^{b-1}}{\Gamma(a) \Gamma(b)}
163159
\Gamma(p) := \int_{0}^{\infty} x^{p-1} e^{-x} dx
164160
$$
165161
162+
We introduce two Beta distributions $f(x)$ and $g(x)$, which we will use to illustrate the different divergence measures.
163+
166164
Let's define parameters and density functions in Python
167165
168166
```{code-cell} ipython3
@@ -198,8 +196,6 @@ plt.legend()
198196
plt.show()
199197
```
200198
201-
202-
203199
(rel_entropy)=
204200
## Kullback–Leibler divergence
205201
@@ -457,15 +453,14 @@ plt.show()
457453
458454
We now generate plots illustrating how overlap visually diminishes as divergence measures increase.
459455
460-
461456
```{code-cell} ipython3
462457
param_grid = [
463458
((1, 1), (1, 1)),
464459
((1, 1), (1.5, 1.2)),
465460
((1, 1), (2, 1.5)),
466461
((1, 1), (3, 1.2)),
467-
((1, 1), (5, 1)),
468-
((1, 1), (0.3, 0.3))
462+
((1, 1), (0.3, 0.3)),
463+
((1, 1), (5, 1))
469464
]
470465
```
471466
@@ -510,9 +505,24 @@ def plot_dist_diff(para_grid):
510505
return divergence_data
511506
512507
divergence_data = plot_dist_diff(param_grid)
513-
```
514-
515508
509+
from pandas.plotting import parallel_coordinates
510+
kl_gf_values = [float(result['KL(g, f)']) for result in results]
511+
512+
df_plot = pd.DataFrame({
513+
"KL(f,g)": kl_fg_values,
514+
"KL(g,f)": kl_gf_values,
515+
"JS": js_values,
516+
"Chernoff": chernoff_values
517+
})
518+
df_plot["pair"] = df_plot.index.astype(str) # just to group lines
519+
520+
plt.figure(figsize=(8,5))
521+
parallel_coordinates(df_plot, "pair", color="blue", alpha=0.3)
522+
plt.ylabel("Value")
523+
plt.title("Parallel comparison of divergence measures per pair")
524+
plt.show()
525+
```
516526
517527
## KL divergence and maximum-likelihood estimation
518528

lectures/imp_sample.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ For our importance sampling estimate, we set $q = h$.
233233
estimate(g_a, g_b, h_a, h_b, T=1, N=10000)
234234
```
235235

236-
Evidently, even at T=1, our importance sampling estimate is closer to $1$ than is the Monte Carlo estimate.
236+
Evidently, even at $T=1$, our importance sampling estimate is closer to $1$ than is the Monte Carlo estimate.
237237

238238
Bigger differences arise when computing expectations over longer sequences, $E_0\left[L\left(\omega^t\right)\right]$.
239239

@@ -248,6 +248,16 @@ estimate(g_a, g_b, g_a, g_b, T=10, N=10000)
248248
estimate(g_a, g_b, h_a, h_b, T=10, N=10000)
249249
```
250250

251+
The Monte Carlo method underestimates because the likelihood ratio $L(\omega^T) = \prod_{t=1}^T \frac{f(\omega_t)}{g(\omega_t)}$ has a highly skewed distribution under $g$.
252+
253+
Most samples from $g$ produce small likelihood ratios, while the true mean requires occasional very large values that are rarely sampled.
254+
255+
In our case, since $g(\omega) \to 0$ as $\omega \to 0$ while $f(\omega)$ remains bounded, the Monte Carlo procedure undersamples precisely where the likelihood ratio $\frac{f(\omega)}{g(\omega)}$ is largest.
256+
257+
As $T$ increases, this problem worsens exponentially, making standard Monte Carlo increasingly unreliable.
258+
259+
Importance sampling with $q = h$ fixes this by sampling more uniformly from regions important to both $f$ and $g$.
260+
251261
## Distribution of Sample Mean
252262

253263
We next study the bias and efficiency of the Monte Carlo and importance sampling approaches.
@@ -364,10 +374,10 @@ b_list = [0.5, 1.2, 5.]
364374
```{code-cell} ipython3
365375
w_range = np.linspace(1e-5, 1-1e-5, 1000)
366376
367-
plt.plot(w_range, g(w_range), label=f'p=Beta({g_a}, {g_b})')
368-
plt.plot(w_range, p(w_range, a_list[0], b_list[0]), label=f'g=Beta({a_list[0]}, {b_list[0]})')
369-
plt.plot(w_range, p(w_range, a_list[1], b_list[1]), label=f'g=Beta({a_list[1]}, {b_list[1]})')
370-
plt.plot(w_range, p(w_range, a_list[2], b_list[2]), label=f'g=Beta({a_list[2]}, {b_list[2]})')
377+
plt.plot(w_range, g(w_range), label=f'g=Beta({g_a}, {g_b})')
378+
plt.plot(w_range, p(w_range, a_list[0], b_list[0]), label=f'$h_1$=Beta({a_list[0]},{b_list[0]})')
379+
plt.plot(w_range, p(w_range, a_list[1], b_list[1]), label=f'$h_2$=Beta({a_list[1]},{b_list[1]})')
380+
plt.plot(w_range, p(w_range, a_list[2], b_list[2]), label=f'$h_3$=Beta({a_list[2]},{b_list[2]})')
371381
plt.title('real data generating process $g$ and importance distribution $h$')
372382
plt.legend()
373383
plt.ylim([0., 3.])

lectures/likelihood_ratio_process.md

Lines changed: 69 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -993,33 +993,9 @@ def compute_protocol_1_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
993993
'L_cumulative': L_cumulative,
994994
'true_models': true_models
995995
}
996-
997-
def compute_protocol_2_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
998-
F_params=(1, 1), G_params=(3, 1.2)):
999-
"""
1000-
Compute error probabilities for Protocol 2.
1001-
"""
1002-
sequences, true_models = protocol_2(π_minus_1,
1003-
T_max, N_simulations, F_params, G_params)
1004-
l_ratios, _ = compute_likelihood_ratios(sequences, f_func, g_func)
1005-
1006-
T_range = np.arange(1, T_max + 1)
1007-
1008-
accuracy = np.empty(T_max)
1009-
for t in range(T_max):
1010-
predictions = (l_ratios[:, t] >= 1)
1011-
actual = true_models[:, t]
1012-
accuracy[t] = np.mean(predictions == actual)
1013-
1014-
return {
1015-
'T_range': T_range,
1016-
'accuracy': accuracy,
1017-
'l_ratios': l_ratios,
1018-
'true_models': true_models
1019-
}
1020996
```
1021997
1022-
The following code visualizes the error probabilities for timing protocol 1 and 2
998+
The following code visualizes the error probabilities for timing protocol 1
1023999
10241000
```{code-cell} ipython3
10251001
:tags: [hide-input]
@@ -1058,44 +1034,6 @@ def analyze_protocol_1(π_minus_1, T_max, N_simulations, f_func, g_func,
10581034
10591035
return result
10601036
1061-
def analyze_protocol_2(π_minus_1, T_max, N_simulations, f_func, g_func,
1062-
theory_error=None, F_params=(1, 1), G_params=(3, 1.2)):
1063-
"""Analyze Protocol 2."""
1064-
result = compute_protocol_2_errors(π_minus_1, T_max, N_simulations,
1065-
f_func, g_func, F_params, G_params)
1066-
1067-
# Plot results
1068-
plt.figure(figsize=(10, 6))
1069-
plt.plot(result['T_range'], result['accuracy'],
1070-
'b-', linewidth=2, label='empirical accuracy')
1071-
1072-
if theory_error is not None:
1073-
plt.axhline(1 - theory_error, color='r', linestyle='--',
1074-
label=f'theoretical accuracy = {1 - theory_error:.4f}')
1075-
1076-
plt.xlabel('$t$')
1077-
plt.ylabel('accuracy')
1078-
plt.legend()
1079-
plt.ylim(0.5, 1.0)
1080-
plt.show()
1081-
1082-
return result
1083-
1084-
def compare_protocols(result1, result2):
1085-
"""Compare results from both protocols."""
1086-
plt.figure(figsize=(10, 6))
1087-
1088-
plt.plot(result1['T_range'], result1['error_prob'], linewidth=2,
1089-
label='Protocol 1 (Model Selection)')
1090-
plt.plot(result2['T_range'], 1 - result2['accuracy'],
1091-
linestyle='--', linewidth=2,
1092-
label='Protocol 2 (classification)')
1093-
1094-
plt.xlabel('$T$')
1095-
plt.ylabel('error probability')
1096-
plt.legend()
1097-
plt.show()
1098-
10991037
# Analyze Protocol 1
11001038
π_minus_1 = 0.5
11011039
T_max = 30
@@ -1130,6 +1068,33 @@ $$ (eq:classerrorprob)
11301068
11311069
where $\tilde \alpha_t = {\rm Prob}(l_t < 1 \mid f)$ and $\tilde \beta_t = {\rm Prob}(l_t \geq 1 \mid g)$.
11321070
1071+
```{code-cell} ipython3
1072+
def compute_protocol_2_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
1073+
F_params=(1, 1), G_params=(3, 1.2)):
1074+
"""
1075+
Compute error probabilities for Protocol 2.
1076+
"""
1077+
sequences, true_models = protocol_2(π_minus_1,
1078+
T_max, N_simulations, F_params, G_params)
1079+
l_ratios, _ = compute_likelihood_ratios(sequences, f_func, g_func)
1080+
1081+
T_range = np.arange(1, T_max + 1)
1082+
1083+
accuracy = np.empty(T_max)
1084+
for t in range(T_max):
1085+
predictions = (l_ratios[:, t] >= 1)
1086+
actual = true_models[:, t]
1087+
accuracy[t] = np.mean(predictions == actual)
1088+
1089+
return {
1090+
'T_range': T_range,
1091+
'accuracy': accuracy,
1092+
'l_ratios': l_ratios,
1093+
'true_models': true_models
1094+
}
1095+
1096+
```
1097+
11331098
Since for each $t$, the decision boundary is the same, the decision boundary can be computed as
11341099
11351100
```{code-cell} ipython3
@@ -1177,11 +1142,11 @@ plt.tight_layout()
11771142
plt.show()
11781143
```
11791144
1180-
To the left of the green vertical line $g < f$, so $l_t < 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $g$ individual.
1145+
To the left of the green vertical line $g < f$, so $l_t > 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $f$ individual.
11811146
1182-
* The shaded orange area equals $\beta$ -- the probability of classifying someone as a type $g$ individual when it is really a type $f$ individual.
1147+
* The shaded red area equals $\beta$ -- the probability of classifying someone as a type $g$ individual when it is really a type $f$ individual.
11831148
1184-
To the right of the green vertical line $g > f$, so $l_t >1 $; therefore a $w_t$ that falls to the right of the green line is classified as a type $f$ individual.
1149+
To the right of the green vertical line $g > f$, so $l_t < 1$; therefore a $w_t$ that falls to the right of the green line is classified as a type $g$ individual.
11851150
11861151
* The shaded blue area equals $\alpha$ -- the probability of classifying someone as a type $f$ when it is really a type $g$ individual.
11871152
@@ -1213,6 +1178,29 @@ Now we simulate timing protocol 2 and compute the classification error probabili
12131178
In the next cell, we also compare the theoretical classification accuracy to the empirical classification accuracy
12141179
12151180
```{code-cell} ipython3
1181+
def analyze_protocol_2(π_minus_1, T_max, N_simulations, f_func, g_func,
1182+
theory_error=None, F_params=(1, 1), G_params=(3, 1.2)):
1183+
"""Analyze Protocol 2."""
1184+
result = compute_protocol_2_errors(π_minus_1, T_max, N_simulations,
1185+
f_func, g_func, F_params, G_params)
1186+
1187+
# Plot results
1188+
plt.figure(figsize=(10, 6))
1189+
plt.plot(result['T_range'], result['accuracy'],
1190+
'b-', linewidth=2, label='empirical accuracy')
1191+
1192+
if theory_error is not None:
1193+
plt.axhline(1 - theory_error, color='r', linestyle='--',
1194+
label=f'theoretical accuracy = {1 - theory_error:.4f}')
1195+
1196+
plt.xlabel('$t$')
1197+
plt.ylabel('accuracy')
1198+
plt.legend()
1199+
plt.ylim(0.5, 1.0)
1200+
plt.show()
1201+
1202+
return result
1203+
12161204
# Analyze Protocol 2
12171205
result_p2 = analyze_protocol_2(π_minus_1, T_max, N_simulations, f, g,
12181206
theory_error, (F_a, F_b), (G_a, G_b))
@@ -1221,7 +1209,21 @@ result_p2 = analyze_protocol_2(π_minus_1, T_max, N_simulations, f, g,
12211209
Let's watch decisions made by the two timing protocols as more and more observations accrue.
12221210
12231211
```{code-cell} ipython3
1224-
# Compare both protocols
1212+
def compare_protocols(result1, result2):
1213+
"""Compare results from both protocols."""
1214+
plt.figure(figsize=(10, 6))
1215+
1216+
plt.plot(result1['T_range'], result1['error_prob'], linewidth=2,
1217+
label='Protocol 1 (Model Selection)')
1218+
plt.plot(result2['T_range'], 1 - result2['accuracy'],
1219+
linestyle='--', linewidth=2,
1220+
label='Protocol 2 (classification)')
1221+
1222+
plt.xlabel('$T$')
1223+
plt.ylabel('error probability')
1224+
plt.legend()
1225+
plt.show()
1226+
12251227
compare_protocols(result_p1, result_p2)
12261228
```
12271229

lectures/likelihood_ratio_process_2.md

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ To design a socially optimal allocation, the social planner wants to know what a
260260
As for the endowment sequences, agent $i$ believes that nature draws i.i.d. sequences from joint densities
261261
262262
$$
263-
\pi_t^i(s^t) = \pi(s_t)^i \pi^i(s_{t-1}) \cdots \pi^i(s_0)
263+
\pi_t^i(s^t) = \pi^i(s_t) \pi^i(s_{t-1}) \cdots \pi^i(s_0)
264264
$$
265265
266266
As for attitudes toward bearing risks, agent $i$ has a one-period utility function
@@ -269,7 +269,7 @@ $$
269269
u(c_t^i) = \ln (c_t^i)
270270
$$
271271
272-
with marginal utility of consumption in period $i$
272+
with marginal utility of consumption in period $t$
273273
274274
$$
275275
u'(c_t^i) = \frac{1}{c_t^i}
@@ -303,7 +303,36 @@ This means that the social planner knows and respects
303303
* each agent's one period utility function $u(\cdot) = \ln(\cdot)$
304304
* each agent $i$'s probability model $\{\pi_t^i(s^t)\}_{t=0}^\infty$
305305
306-
Consequently, we anticipate that these objects will appear in the social planner's rule for allocating the aggregate endowment each period.
306+
Consequently, we anticipate that these objects will appear in the social planner's rule for allocating the aggregate endowment each period.
307+
308+
The Lagrangian for the social planner's problem is
309+
310+
$$
311+
L = \sum_{t=0}^{\infty}\sum_{s^t} \{ \lambda \delta^t u(c_t^1(s^t)) \pi_t^1(s^t) + (1-\lambda) \delta^t u(c_t^2(s^t)) \pi_t^2(s^t) + \theta_t(s^t)(1-c_t^1(s^t)-c_t^2(s^t)) \}
312+
$$
313+
314+
where $\theta_t(s^t)$ are the shadow prices.
315+
316+
The first order conditions for maximizing $L$ with respect to $c_t^i(s^t)$ are:
317+
318+
$$
319+
\lambda \delta^t u'(c_t^1(s^t)) \pi_t^1(s^t) = \theta_t(s^t), \quad (1-\lambda) \delta^t u'(c_t^2(s^t)) \pi_t^2(s^t) = \theta_t(s^t)
320+
$$
321+
322+
Substituting formula {eq}`eq:allocationrule1` for $c_t^1(s^t)$, we get
323+
324+
$$
325+
\theta_t(s^t) = \delta^t [(1-\lambda)\pi_t^2(s^t) + \lambda \pi_t^1(s^t)]
326+
$$
327+
328+
Now for the competitive equilibrium, notice that if we take $\mu_1 = \frac{1}{\lambda}$ and $\mu_2 = \frac{1}{1-\lambda}$, formula {eq}`eq:allocationce` agrees with formula {eq}`eq:allocationrule1`, and we get from {eq}`eq:priceequation1`
329+
330+
$$
331+
p_t(s^t) = \delta^t \lambda \pi_t^1(s^t) \frac{1-\lambda + \lambda l_t(s^t)}{\lambda l_t(s^t)} = \delta^t \pi_t^2(s^t)[1-\lambda + \lambda l_t(s^t)] =
332+
\delta^t \bigl[(1 - \lambda) \pi_t^2(s^t) + \lambda \pi_t^1(s^t)\bigr]
333+
$$
334+
335+
Thus, "shadow" prices $\theta_t(s^t)$ in the planning problem equal the competitive equilibrium prices $p_t(s^t)$.
307336
308337
309338
First-order necessary conditions for maximizing welfare criterion {eq}`eq:welfareW` subject to the feasibility constraint {eq}`eq:feasibility` are
@@ -833,11 +862,6 @@ Likelihood ratio processes appear again in {doc}`advanced:additive_functionals`.
833862

834863

835864

836-
{doc}`ge_arrow`
837-
838-
839-
840-
841865
## Exercises
842866

843867
```{exercise}

0 commit comments

Comments
 (0)