updates

HumphreyYang · HumphreyYang · commit 91a35c2a9765 · 2025-09-01T08:30:08.000+02:00
diff --git a/lectures/divergence_measures.md b/lectures/divergence_measures.md
@@ -505,23 +505,6 @@ def plot_dist_diff(para_grid):
     return divergence_data
 
 divergence_data = plot_dist_diff(param_grid)
-
-from pandas.plotting import parallel_coordinates
-kl_gf_values = [float(result['KL(g, f)']) for result in results]
-
-df_plot = pd.DataFrame({
-    "KL(f,g)": kl_fg_values,
-    "KL(g,f)": kl_gf_values,
-    "JS": js_values,
-    "Chernoff": chernoff_values
-})
-df_plot["pair"] = df_plot.index.astype(str)  # just to group lines
-
-plt.figure(figsize=(8,5))
-parallel_coordinates(df_plot, "pair", color="blue", alpha=0.3)
-plt.ylabel("Value")
-plt.title("Parallel comparison of divergence measures per pair")
-plt.show()
 ```
 
 ## KL divergence and maximum-likelihood estimation
diff --git a/lectures/imp_sample.md b/lectures/imp_sample.md
@@ -36,7 +36,7 @@ import matplotlib.pyplot as plt
 from math import gamma
 ```
 
-## Mathematical Expectation of Likelihood Ratio
+## Mathematical expectation of likelihood ratio
 
 In {doc}`this lecture <likelihood_ratio_process>`, we studied a likelihood ratio $\ell \left(\omega_t\right)$
 
@@ -57,11 +57,10 @@ $$
 Our goal is to approximate the mathematical expectation $E \left[ L\left(\omega^t\right) \right]$ well.
 
 In {doc}`this lecture <likelihood_ratio_process>`, we showed that  $E \left[ L\left(\omega^t\right) \right]$  equals $1$ for all $t$.
+
 We want to check out how well this holds if we replace $E$ by with  sample averages from simulations.
 
-This turns out to be easier said than done because for
-Beta distributions assumed above, $L\left(\omega^t\right)$ has
-a very skewed distribution with a very long tail as $t \rightarrow \infty$.
+This turns out to be easier said than done because for Beta distributions assumed above, $L\left(\omega^t\right)$ has a very skewed distribution with a very long tail as $t \rightarrow \infty$.
 
 This property makes it difficult  efficiently and accurately to estimate the mean by standard Monte Carlo simulation methods.
 
@@ -156,7 +155,7 @@ $$
 E^g\left[\ell\left(\omega\right)\right] = \int_\Omega \ell(\omega) g(\omega) d\omega = \int_\Omega \ell(\omega) \frac{g(\omega)}{h(\omega)} h(\omega) d\omega = E^h\left[\ell\left(\omega\right) \frac{g(\omega)}{h(\omega)}\right]
 $$
 
-## Selecting a Sampling Distribution
+## Selecting a sampling distribution
 
 Since we must use an $h$ that has larger mass in parts of the distribution to which  $g$ puts low mass, we use $h=Beta(0.5, 0.5)$ as our importance distribution.
 
@@ -178,7 +177,7 @@ plt.ylim([0., 3.])
 plt.show()
 ```
 
-## Approximating a Cumulative Likelihood Ratio
+## Approximating a cumulative likelihood ratio
 
 We now study how to use importance sampling to approximate
 ${E} \left[L(\omega^t)\right] = \left[\prod_{i=1}^T \ell \left(\omega_i\right)\right]$.
@@ -252,13 +251,13 @@ The Monte Carlo method underestimates because the likelihood ratio $L(\omega^T)
 
 Most samples from $g$ produce small likelihood ratios, while the true mean requires occasional very large values that are rarely sampled.
 
-In our case, since $g(\omega) \to 0$ as $\omega \to 0$ while $f(\omega)$ remains bounded, the Monte Carlo procedure undersamples precisely where the likelihood ratio $\frac{f(\omega)}{g(\omega)}$ is largest.
+In our case, since $g(\omega) \to 0$ as $\omega \to 0$ while $f(\omega)$ remains constant, the Monte Carlo procedure undersamples precisely where the likelihood ratio $\frac{f(\omega)}{g(\omega)}$ is largest.
 
 As $T$ increases, this problem worsens exponentially, making standard Monte Carlo increasingly unreliable.
 
 Importance sampling with $q = h$ fixes this by sampling more uniformly from regions important to both $f$ and $g$.
 
-## Distribution of  Sample Mean
+## Distribution of sample mean
 
 We next study the bias and efficiency of the Monte Carlo and importance sampling approaches.
 
@@ -333,17 +332,15 @@ The simulation exercises above show that the importance sampling estimates are u
 
 Evidently, the bias increases with increases in $T$.
 
-## Choosing a  Sampling Distribution
+## Choosing a sampling distribution
 
 +++
 
 Above, we arbitraily chose $h = Beta(0.5,0.5)$ as the importance distribution.
 
 Is there an optimal importance distribution?
 
-In our particular case, since we  know in advance that $E_0 \left[ L\left(\omega^t\right) \right] = 1$.
-
-We can use that knowledge to our advantage.
+In our particular case, since we  know in advance that $E_0 \left[ L\left(\omega^t\right) \right] = 1$, we can use that knowledge to our advantage.
 
 Thus, suppose that we simply use  $h = f$.
 
diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md
@@ -194,7 +194,7 @@ def plot_likelihood_paths(l_seq, title="Likelihood ratio paths",
 ```
 
 (nature_likeli)=
-## Nature Permanently Draws from Density g
+## Nature permanently draws from density g
 
 We first simulate the likelihood ratio process when nature permanently
 draws from $g$.
@@ -264,7 +264,7 @@ Mathematical induction implies
 $E\left[L\left(w^{t}\right)\bigm|q=g\right]=1$ for all
 $t \geq 1$.
 
-## Peculiar Property
+## Peculiar property
 
 How can $E\left[L\left(w^{t}\right)\bigm|q=g\right]=1$ possibly be true when most probability mass of the likelihood
 ratio process is piling up near $0$ as
@@ -300,7 +300,7 @@ We explain the problem in more detail in {doc}`this lecture <imp_sample>`.
 There we describe an alternative way to compute the mean of a likelihood ratio by computing the mean of a _different_ random variable by sampling from a _different_ probability distribution.
 
 
-## Nature Permanently Draws from Density f
+## Nature permanently draws from density f
 
 Now suppose that before time $0$ nature permanently decided to draw repeatedly from density $f$.
 
@@ -348,7 +348,7 @@ plt.plot(range(T), np.sum(l_seq_f > 10000, axis=0) / N)
 plt.show()
 ```
 
-## Likelihood Ratio Test
+## Likelihood ratio test
 
 We now describe how to employ the machinery
 of Neyman and Pearson {cite}`Neyman_Pearson` to test the hypothesis that history $w^t$ is generated by repeated
@@ -959,7 +959,7 @@ $$
 p(\textrm{wrong decision}) = {1 \over 2} (\alpha_T + \beta_T) .
 $$ (eq:detectionerrorprob)
 
-Now let's simulate timing protocol 1 and 2 and compute the error probabilities
+Now let's simulate timing protocol 1 and compute the error probabilities
 
 ```{code-cell} ipython3
 
@@ -1068,6 +1068,8 @@ $$ (eq:classerrorprob)
 
 where $\tilde \alpha_t = {\rm Prob}(l_t < 1 \mid f)$ and $\tilde \beta_t = {\rm Prob}(l_t \geq 1 \mid g)$.
 
+Now let's write some code to simulate it
+
 ```{code-cell} ipython3
 def compute_protocol_2_errors(π_minus_1, T_max, N_simulations, f_func, g_func,
                               F_params=(1, 1), G_params=(3, 1.2)):
@@ -1712,7 +1714,7 @@ P_g = np.array([[0.5, 0.3, 0.2],
 markov_results = analyze_markov_chains(P_f, P_g)
 ```
 
-## Related Lectures
+## Related lectures
 
 Likelihood processes play an important role in Bayesian learning, as described in {doc}`likelihood_bayes`
 and as applied in {doc}`odu`.
diff --git a/lectures/likelihood_ratio_process_2.md b/lectures/likelihood_ratio_process_2.md
@@ -65,7 +65,7 @@ from math import gamma
 from scipy.integrate import quad
 ```
 
-## Review: Likelihood Ratio Processes
+## Review: likelihood ratio processes
 
 We'll begin by reminding ourselves definitions and properties of likelihood ratio processes.  
 
@@ -166,7 +166,7 @@ def simulate(a, b, T=50, N=500):
     return l_arr
 ```
 
-## Blume and Easley's Setting
+## Blume and Easley's setting
 
 Let the random variable $s_t \in (0,1)$ at time $t =0, 1, 2, \ldots$ be distributed according to the same Beta distribution with parameters 
 $\theta = \{\theta_1, \theta_2\}$.
@@ -195,7 +195,7 @@ $$c^1(s_t) = y_t^1 = s_t. $$
 
 But in our model, agent 1 is not alone.
 
-## Nature and Agents' Beliefs
+## Nature and agents' beliefs
 
 Nature draws i.i.d. sequences $\{s_t\}_{t=0}^\infty$ from $\pi_t(s^t)$.
 
@@ -239,7 +239,7 @@ $$
 c_t^1 + c_t^2 = 1 .
 $$
 
-## A Socialist Risk-Sharing Arrangement
+## A socialist risk-sharing arrangement
 
 In order to share risks, a benevolent social planner  dictates a history-dependent consumption allocation that takes the form of a sequence of functions 
 
@@ -284,7 +284,7 @@ $$ (eq:objectiveagenti)
 where $\delta \in (0,1)$ is an intertemporal discount factor, and $u(\cdot)$ is a strictly increasing, concave one-period utility function.
 
 
-## Social Planner's Allocation Problem
+## Social planner's allocation problem
 
 The benevolent dictator has all the information it requires to choose a consumption allocation that maximizes the social welfare criterion 
 
@@ -305,45 +305,12 @@ This means that the social planner knows and respects
 
 Consequently, we anticipate that these objects will appear in the social planner's rule for allocating the aggregate endowment each period.
 
-The Lagrangian for the social planner's problem is
-
-$$
-L = \sum_{t=0}^{\infty}\sum_{s^t} \{ \lambda \delta^t u(c_t^1(s^t)) \pi_t^1(s^t) + (1-\lambda) \delta^t u(c_t^2(s^t)) \pi_t^2(s^t) + \theta_t(s^t)(1-c_t^1(s^t)-c_t^2(s^t)) \}
-$$
-
-where $\theta_t(s^t)$ are the shadow prices.
-
-The first order conditions for maximizing $L$ with respect to $c_t^i(s^t)$ are:
-
-$$
-\lambda \delta^t u'(c_t^1(s^t)) \pi_t^1(s^t) = \theta_t(s^t), \quad (1-\lambda) \delta^t u'(c_t^2(s^t)) \pi_t^2(s^t) = \theta_t(s^t)
-$$
-
-Substituting formula {eq}`eq:allocationrule1` for $c_t^1(s^t)$, we get
-
-$$
-\theta_t(s^t) = \delta^t [(1-\lambda)\pi_t^2(s^t) + \lambda \pi_t^1(s^t)]
-$$
-
-Now for the competitive equilibrium, notice that if we take $\mu_1 = \frac{1}{\lambda}$ and $\mu_2 = \frac{1}{1-\lambda}$, formula {eq}`eq:allocationce` agrees with formula {eq}`eq:allocationrule1`, and we get from {eq}`eq:priceequation1`
-
-$$
-p_t(s^t) = \delta^t \lambda \pi_t^1(s^t) \frac{1-\lambda + \lambda l_t(s^t)}{\lambda l_t(s^t)} = \delta^t \pi_t^2(s^t)[1-\lambda + \lambda l_t(s^t)] = 
-\delta^t  \bigl[(1 - \lambda) \pi_t^2(s^t) + \lambda \pi_t^1(s^t)\bigr]
-$$
-
-Thus, "shadow" prices $\theta_t(s^t)$ in the planning problem equal the competitive equilibrium prices $p_t(s^t)$. 
-
-
 First-order necessary conditions for maximizing welfare criterion {eq}`eq:welfareW` subject to the feasibility constraint {eq}`eq:feasibility` are 
 
 $$\frac{\pi_t^2(s^t)}{\pi_t^1(s^t)} \frac{(1/c_t^2(s^t))}{(1/c_t^1(s^t))} = \frac{\lambda}{1-\lambda}$$
 
 which can be rearranged to become
 
-
-
-
 $$
 \frac{c_t^1(s^t)}{c_t^2(s^t)} = \frac{\lambda}{1-\lambda} l_t(s^t)
 $$ (eq:allocationrule0)
@@ -390,7 +357,7 @@ $$
 
 
 
-## If You're So Smart, $\ldots$ 
+## If you're so smart, $\ldots$ 
 
 
 Let's compute some values of limiting allocations {eq}`eq:allocationrule1` for some interesting possible limiting
@@ -429,7 +396,7 @@ Doing this will allow us to connect our analysis with an argument of {cite}`alch
 
 
 
-## Competitive Equilibrium Prices 
+## Competitive equilibrium prices 
 
 Two fundamental welfare theorems for general equilibrium models lead us to anticipate that there is  a connection between the allocation that solves the social planning problem we have been studying and the allocation in a  **competitive equilibrium**  with complete markets in history-contingent commodities.
 
@@ -555,6 +522,9 @@ According to formula {eq}`eq:pformulafinal`, we have the following possible limi
 * when $l_\infty = \infty$, $c_\infty^1 = 1 $ and tails of competitive equilibrium prices reflect agent $1$'s probability model $\pi_t^1(s^t)$ according to $p_t(s^t) \propto \delta^t \pi_t^1(s^t) $
 * for small $t$'s, competitive equilibrium prices reflect both agents' probability models.  
 
+We leave the verification of the shadow prices to the reader since it follows from 
+the same reasoning.
+
 ## Simulations 
 
 Now let's implement some simulations when agent $1$ believes marginal density 
@@ -850,7 +820,7 @@ This ties in nicely with {eq}`eq:kl_likelihood_link`.
 
 
 
-## Related Lectures
+## Related lectures
 
 Complete markets models with homogeneous beliefs, a kind often used in macroeconomics and finance,  are studied in this quantecon lecture {doc}`ge_arrow`.
 
diff --git a/lectures/likelihood_var.md b/lectures/likelihood_var.md
@@ -20,7 +20,7 @@ kernelspec:
 </div>
 ```
 
-# Likelihood Processes for VAR Models
+# Likelihood Processes For VAR Models
 
 ```{contents} Contents
 :depth: 2
@@ -156,11 +156,11 @@ Given the Gaussian structure, the conditional distribution $f(x_{t+1} | x_t)$ is
 - Mean: $A x_t$
 - Covariance: $CC'$
 
-The log conditional density is:
+The log conditional density is
 
 $$
 \log f(x_{t+1} | x_t) = -\frac{n}{2} \log(2\pi) - \frac{1}{2} \log \det(CC') - \frac{1}{2} (x_{t+1} - A x_t)' (CC')^{-1} (x_{t+1} - A x_t)
-$$
+$$ (eq:cond_den)
 
 ```{code-cell} ipython3
 def log_likelihood_transition(x_next, x_curr, model):
@@ -250,11 +250,8 @@ $$
 L_t = \sum_{s=1}^{t} \ell_s = \sum_{s=1}^{t} \log \frac{p_f(x_s | x_{s-1})}{p_g(x_s | x_{s-1})}
 $$
 
-For the VAR model $x_{t+1} = A x_t + C w_{t+1}$ where $w_{t+1} \sim \mathcal{N}(0, I)$, the conditional density is
+where $p_f(x_t | x_{t-1})$ and $p_g(x_t | x_{t-1})$ are given by their respective conditional densities defined in {eq}`eq:cond_den`.
 
-$$
-p(x_{t+1} | x_t) = (2\pi)^{-n/2} |CC^T|^{-1/2} \exp\left(-\frac{1}{2}(x_{t+1} - Ax_t)^T (CC^T)^{-1} (x_{t+1} - Ax_t)\right)
-$$
 
 Let's write those equations in Python
 
@@ -293,7 +290,7 @@ def compute_likelihood_ratio_var(paths, model_f, model_g):
     return log_L_ratios if N_paths > 1 else log_L_ratios[0]
 ```
 
-## Example 1: Two AR(1) processes
+## Example 1: two AR(1) processes
 
 Let's start with a simple example comparing two univariate AR(1) processes with $A_f = 0.8$, $A_g = 0.5$, and $C_f = 0.3$, $C_g = 0.4$
 
@@ -336,7 +333,7 @@ plt.show()
 
 As we expected, the likelihood ratio processes goes to $+\infty$ as $T$ increases, indicating that model $f$ is chosen correctly by our algorithm.
 
-## Example 2: Bivariate VAR models
+## Example 2: bivariate VAR models
 
 Now let's consider an example with bivariate VAR models with 
 
diff --git a/lectures/quantecon_review_instructions.md b/lectures/quantecon_review_instructions.md