incorprate reader's feedback

HumphreyYang · HumphreyYang · commit 434ccfeafef0 · 2025-08-31T19:02:46.000+02:00
diff --git a/lectures/divergence_measures.md b/lectures/divergence_measures.md
@@ -4,7 +4,7 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.17.1
+    jupytext_version: 1.16.6
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -60,10 +60,6 @@ import pandas as pd
 from IPython.display import display, Math
 ```
 
-
-
-
-
 ## Primer on entropy, cross-entropy, KL divergence
 
 Before diving in, we'll introduce some useful concepts in a simple setting.
@@ -163,6 +159,8 @@ f(z; a, b) = \frac{\Gamma(a+b) z^{a-1} (1-z)^{b-1}}{\Gamma(a) \Gamma(b)}
 \Gamma(p) := \int_{0}^{\infty} x^{p-1} e^{-x} dx
 $$
 
+We introduce two Beta distributions $f(x)$ and $g(x)$, which we will use to illustrate the different divergence measures.
+
 Let's define parameters and density functions in Python
 
 ```{code-cell} ipython3
@@ -198,8 +196,6 @@ plt.legend()
 plt.show()
 ```
 
-
-
 (rel_entropy)=
 ## Kullback–Leibler divergence
 
@@ -457,15 +453,14 @@ plt.show()
 
 We now generate plots illustrating how overlap visually diminishes as divergence measures increase.
 
-
 ```{code-cell} ipython3
 param_grid = [
     ((1, 1), (1, 1)),   
     ((1, 1), (1.5, 1.2)),
     ((1, 1), (2, 1.5)),  
     ((1, 1), (3, 1.2)),  
-    ((1, 1), (5, 1)),
-    ((1, 1), (0.3, 0.3))
+    ((1, 1), (0.3, 0.3)),
+    ((1, 1), (5, 1))
 ]
 ```
 
@@ -510,9 +505,24 @@ def plot_dist_diff(para_grid):
     return divergence_data
 
 divergence_data = plot_dist_diff(param_grid)
-```
-
 
+from pandas.plotting import parallel_coordinates
+kl_gf_values = [float(result['KL(g, f)']) for result in results]
+
+df_plot = pd.DataFrame({
+    "KL(f,g)": kl_fg_values,
+    "KL(g,f)": kl_gf_values,
+    "JS": js_values,
+    "Chernoff": chernoff_values
+})
+df_plot["pair"] = df_plot.index.astype(str)  # just to group lines
+
+plt.figure(figsize=(8,5))
+parallel_coordinates(df_plot, "pair", color="blue", alpha=0.3)
+plt.ylabel("Value")
+plt.title("Parallel comparison of divergence measures per pair")
+plt.show()
+```
 
 ## KL divergence and maximum-likelihood estimation
 
diff --git a/lectures/imp_sample.md b/lectures/imp_sample.md
@@ -233,7 +233,7 @@ For our importance sampling estimate, we set $q = h$.
 estimate(g_a, g_b, h_a, h_b, T=1, N=10000)
 ```
 
-Evidently, even at T=1, our importance sampling  estimate is closer to $1$ than is the Monte Carlo estimate.
+Evidently, even at $T=1$, our importance sampling  estimate is closer to $1$ than is the Monte Carlo estimate.
 
 Bigger differences arise when computing expectations over longer sequences, $E_0\left[L\left(\omega^t\right)\right]$.
 
@@ -248,6 +248,16 @@ estimate(g_a, g_b, g_a, g_b, T=10, N=10000)
 estimate(g_a, g_b, h_a, h_b, T=10, N=10000)
 ```
 
+The Monte Carlo method underestimates because the likelihood ratio $L(\omega^T) = \prod_{t=1}^T \frac{f(\omega_t)}{g(\omega_t)}$ has a highly skewed distribution under $g$.
+
+Most samples from $g$ produce small likelihood ratios, while the true mean requires occasional very large values that are rarely sampled.
+
+In our case, since $g(\omega) \to 0$ as $\omega \to 0$ while $f(\omega)$ remains bounded, the Monte Carlo procedure undersamples precisely where the likelihood ratio $\frac{f(\omega)}{g(\omega)}$ is largest.
+
+As $T$ increases, this problem worsens exponentially, making standard Monte Carlo increasingly unreliable.
+
+Importance sampling with $q = h$ fixes this by sampling more uniformly from regions important to both $f$ and $g$.
+
 ## Distribution of  Sample Mean
 
 We next study the bias and efficiency of the Monte Carlo and importance sampling approaches.
@@ -364,10 +374,10 @@ b_list = [0.5, 1.2, 5.]
 ```{code-cell} ipython3
 w_range = np.linspace(1e-5, 1-1e-5, 1000)
 
-plt.plot(w_range, g(w_range), label=f'p=Beta({g_a}, {g_b})')
-plt.plot(w_range, p(w_range, a_list[0], b_list[0]), label=f'g=Beta({a_list[0]}, {b_list[0]})')
-plt.plot(w_range, p(w_range, a_list[1], b_list[1]), label=f'g=Beta({a_list[1]}, {b_list[1]})')
-plt.plot(w_range, p(w_range, a_list[2], b_list[2]), label=f'g=Beta({a_list[2]}, {b_list[2]})')
+plt.plot(w_range, g(w_range), label=f'g=Beta({g_a}, {g_b})')
+plt.plot(w_range, p(w_range, a_list[0], b_list[0]), label=f'$h_1$=Beta({a_list[0]},{b_list[0]})')
+plt.plot(w_range, p(w_range, a_list[1], b_list[1]), label=f'$h_2$=Beta({a_list[1]},{b_list[1]})')
+plt.plot(w_range, p(w_range, a_list[2], b_list[2]), label=f'$h_3$=Beta({a_list[2]},{b_list[2]})')
 plt.title('real data generating process $g$ and importance distribution $h$')
 plt.legend()
 plt.ylim([0., 3.])
diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md
@@ -993,33 +993,9 @@ def compute_protocol_1_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
         'L_cumulative': L_cumulative,
         'true_models': true_models
     }
-
-def compute_protocol_2_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
-                              F_params=(1, 1), G_params=(3, 1.2)):
-    """
-    Compute error probabilities for Protocol 2.
-    """
-    sequences, true_models = protocol_2(π_minus_1, 
-                        T_max, N_simulations, F_params, G_params)
-    l_ratios, _ = compute_likelihood_ratios(sequences, f_func, g_func)
-    
-    T_range = np.arange(1, T_max + 1)
-    
-    accuracy = np.empty(T_max)
-    for t in range(T_max):
-        predictions = (l_ratios[:, t] >= 1)
-        actual = true_models[:, t]
-        accuracy[t] = np.mean(predictions == actual)
-    
-    return {
-        'T_range': T_range,
-        'accuracy': accuracy,
-        'l_ratios': l_ratios,
-        'true_models': true_models
-    }
 ```
 
-The following code visualizes the error probabilities for timing protocol 1 and 2
+The following code visualizes the error probabilities for timing protocol 1
 
 ```{code-cell} ipython3
 :tags: [hide-input]
@@ -1058,44 +1034,6 @@ def analyze_protocol_1(π_minus_1, T_max, N_simulations, f_func, g_func,
     
     return result
 
-def analyze_protocol_2(π_minus_1, T_max, N_simulations, f_func, g_func, 
-                      theory_error=None, F_params=(1, 1), G_params=(3, 1.2)):
-    """Analyze Protocol 2."""
-    result = compute_protocol_2_errors(π_minus_1, T_max, N_simulations, 
-                                      f_func, g_func, F_params, G_params)
-    
-    # Plot results
-    plt.figure(figsize=(10, 6))
-    plt.plot(result['T_range'], result['accuracy'], 
-            'b-', linewidth=2, label='empirical accuracy')
-    
-    if theory_error is not None:
-        plt.axhline(1 - theory_error, color='r', linestyle='--', 
-                   label=f'theoretical accuracy = {1 - theory_error:.4f}')
-    
-    plt.xlabel('$t$')
-    plt.ylabel('accuracy')
-    plt.legend()
-    plt.ylim(0.5, 1.0)
-    plt.show()
-    
-    return result
-
-def compare_protocols(result1, result2):
-    """Compare results from both protocols."""
-    plt.figure(figsize=(10, 6))
-    
-    plt.plot(result1['T_range'], result1['error_prob'], linewidth=2, 
-            label='Protocol 1 (Model Selection)')
-    plt.plot(result2['T_range'], 1 - result2['accuracy'], 
-            linestyle='--', linewidth=2, 
-            label='Protocol 2 (classification)')
-    
-    plt.xlabel('$T$')
-    plt.ylabel('error probability')
-    plt.legend()
-    plt.show()
-
 # Analyze Protocol 1
 π_minus_1 = 0.5
 T_max = 30
@@ -1130,6 +1068,33 @@ $$ (eq:classerrorprob)
 
 where $\tilde \alpha_t = {\rm Prob}(l_t < 1 \mid f)$ and $\tilde \beta_t = {\rm Prob}(l_t \geq 1 \mid g)$.
 
+```{code-cell} ipython3
+def compute_protocol_2_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
+                              F_params=(1, 1), G_params=(3, 1.2)):
+    """
+    Compute error probabilities for Protocol 2.
+    """
+    sequences, true_models = protocol_2(π_minus_1, 
+                        T_max, N_simulations, F_params, G_params)
+    l_ratios, _ = compute_likelihood_ratios(sequences, f_func, g_func)
+    
+    T_range = np.arange(1, T_max + 1)
+    
+    accuracy = np.empty(T_max)
+    for t in range(T_max):
+        predictions = (l_ratios[:, t] >= 1)
+        actual = true_models[:, t]
+        accuracy[t] = np.mean(predictions == actual)
+    
+    return {
+        'T_range': T_range,
+        'accuracy': accuracy,
+        'l_ratios': l_ratios,
+        'true_models': true_models
+    }
+
+```
+
 Since for each $t$, the decision boundary is the same, the decision boundary can be computed as
 
 ```{code-cell} ipython3
@@ -1177,11 +1142,11 @@ plt.tight_layout()
 plt.show()
 ```
 
-To the left of the green vertical line $g < f$, so $l_t < 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $g$ individual. 
+To the left of the green vertical line $g < f$, so $l_t > 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $f$ individual. 
 
- * The shaded orange area equals $\beta$ -- the probability of classifying someone as a type $g$ individual when it is really a type $f$ individual.
+ * The shaded red area equals $\beta$ -- the probability of classifying someone as a type $g$ individual when it is really a type $f$ individual.
 
-To the right of the green vertical line $g > f$, so $l_t >1 $; therefore a $w_t$ that falls to the right of the green line is classified as a type $f$ individual. 
+To the right of the green vertical line $g > f$, so $l_t < 1$; therefore a $w_t$ that falls to the right of the green line is classified as a type $g$ individual.
 
  * The shaded blue area equals $\alpha$ -- the probability of classifying someone as a type $f$ when it is really a type $g$ individual.  
 
@@ -1213,6 +1178,29 @@ Now we simulate timing protocol 2 and compute the classification error probabili
 In the next cell, we also compare the theoretical classification accuracy to the empirical classification accuracy
 
 ```{code-cell} ipython3
+def analyze_protocol_2(π_minus_1, T_max, N_simulations, f_func, g_func, 
+                      theory_error=None, F_params=(1, 1), G_params=(3, 1.2)):
+    """Analyze Protocol 2."""
+    result = compute_protocol_2_errors(π_minus_1, T_max, N_simulations, 
+                                      f_func, g_func, F_params, G_params)
+    
+    # Plot results
+    plt.figure(figsize=(10, 6))
+    plt.plot(result['T_range'], result['accuracy'], 
+            'b-', linewidth=2, label='empirical accuracy')
+    
+    if theory_error is not None:
+        plt.axhline(1 - theory_error, color='r', linestyle='--', 
+                   label=f'theoretical accuracy = {1 - theory_error:.4f}')
+    
+    plt.xlabel('$t$')
+    plt.ylabel('accuracy')
+    plt.legend()
+    plt.ylim(0.5, 1.0)
+    plt.show()
+    
+    return result
+
 # Analyze Protocol 2
 result_p2 = analyze_protocol_2(π_minus_1, T_max, N_simulations, f, g, 
                               theory_error, (F_a, F_b), (G_a, G_b))
@@ -1221,7 +1209,21 @@ result_p2 = analyze_protocol_2(π_minus_1, T_max, N_simulations, f, g,
 Let's watch decisions made by  the two timing protocols as more and more observations accrue.
 
 ```{code-cell} ipython3
-# Compare both protocols
+def compare_protocols(result1, result2):
+    """Compare results from both protocols."""
+    plt.figure(figsize=(10, 6))
+    
+    plt.plot(result1['T_range'], result1['error_prob'], linewidth=2, 
+            label='Protocol 1 (Model Selection)')
+    plt.plot(result2['T_range'], 1 - result2['accuracy'], 
+            linestyle='--', linewidth=2, 
+            label='Protocol 2 (classification)')
+    
+    plt.xlabel('$T$')
+    plt.ylabel('error probability')
+    plt.legend()
+    plt.show()
+    
 compare_protocols(result_p1, result_p2)
 ```
 
diff --git a/lectures/likelihood_ratio_process_2.md b/lectures/likelihood_ratio_process_2.md
@@ -260,7 +260,7 @@ To design a socially optimal allocation, the social planner wants to know what a
 As for the endowment sequences, agent $i$ believes that nature draws i.i.d. sequences from joint densities 
 
 $$
-\pi_t^i(s^t) = \pi(s_t)^i \pi^i(s_{t-1}) \cdots \pi^i(s_0)
+\pi_t^i(s^t) = \pi^i(s_t) \pi^i(s_{t-1}) \cdots \pi^i(s_0)
 $$ 
 
 As for attitudes toward bearing risks, agent $i$ has a one-period utility function
@@ -269,7 +269,7 @@ $$
 u(c_t^i) = \ln (c_t^i)
 $$
 
-with marginal utility of consumption in period $i$
+with marginal utility of consumption in period $t$
 
 $$
 u'(c_t^i) = \frac{1}{c_t^i}
@@ -303,7 +303,36 @@ This means that the social planner knows and respects
 * each agent's one period utility function $u(\cdot) = \ln(\cdot)$
 * each agent $i$'s probability model $\{\pi_t^i(s^t)\}_{t=0}^\infty$
 
-Consequently, we anticipate that these objects will appear in the social planner's rule for allocating the aggregate endowment each period. 
+Consequently, we anticipate that these objects will appear in the social planner's rule for allocating the aggregate endowment each period.
+
+The Lagrangian for the social planner's problem is
+
+$$
+L = \sum_{t=0}^{\infty}\sum_{s^t} \{ \lambda \delta^t u(c_t^1(s^t)) \pi_t^1(s^t) + (1-\lambda) \delta^t u(c_t^2(s^t)) \pi_t^2(s^t) + \theta_t(s^t)(1-c_t^1(s^t)-c_t^2(s^t)) \}
+$$
+
+where $\theta_t(s^t)$ are the shadow prices.
+
+The first order conditions for maximizing $L$ with respect to $c_t^i(s^t)$ are:
+
+$$
+\lambda \delta^t u'(c_t^1(s^t)) \pi_t^1(s^t) = \theta_t(s^t), \quad (1-\lambda) \delta^t u'(c_t^2(s^t)) \pi_t^2(s^t) = \theta_t(s^t)
+$$
+
+Substituting formula {eq}`eq:allocationrule1` for $c_t^1(s^t)$, we get
+
+$$
+\theta_t(s^t) = \delta^t [(1-\lambda)\pi_t^2(s^t) + \lambda \pi_t^1(s^t)]
+$$
+
+Now for the competitive equilibrium, notice that if we take $\mu_1 = \frac{1}{\lambda}$ and $\mu_2 = \frac{1}{1-\lambda}$, formula {eq}`eq:allocationce` agrees with formula {eq}`eq:allocationrule1`, and we get from {eq}`eq:priceequation1`
+
+$$
+p_t(s^t) = \delta^t \lambda \pi_t^1(s^t) \frac{1-\lambda + \lambda l_t(s^t)}{\lambda l_t(s^t)} = \delta^t \pi_t^2(s^t)[1-\lambda + \lambda l_t(s^t)] = 
+\delta^t  \bigl[(1 - \lambda) \pi_t^2(s^t) + \lambda \pi_t^1(s^t)\bigr]
+$$
+
+Thus, "shadow" prices $\theta_t(s^t)$ in the planning problem equal the competitive equilibrium prices $p_t(s^t)$. 
 
 
 First-order necessary conditions for maximizing welfare criterion {eq}`eq:welfareW` subject to the feasibility constraint {eq}`eq:feasibility` are 
@@ -833,11 +862,6 @@ Likelihood ratio processes appear again in {doc}`advanced:additive_functionals`.
 
 
 
-{doc}`ge_arrow`
-
-
-
-
 ## Exercises
 
 ```{exercise}
diff --git a/lectures/likelihood_var.md b/lectures/likelihood_var.md