diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib index 1b6e4dbd2..ac295e8e3 100644 --- a/lectures/_static/quant-econ.bib +++ b/lectures/_static/quant-econ.bib @@ -2,6 +2,35 @@ QuantEcon Bibliography File used in conjuction with sphinxcontrib-bibtex package Note: Extended Information (like abstracts, doi, url's etc.) can be found in quant-econ-extendedinfo.bib file in _static/ ### +@book{friedman1953essays, + title={Essays in positive economics}, + author={Friedman, Milton}, + year={1953}, + publisher={University of Chicago press} +} + +@article{alchian1950uncertainty, + title={Uncertainty, evolution, and economic theory}, + author={Alchian, Armen A}, + journal={Journal of political economy}, + volume={58}, + number={3}, + pages={211--221}, + year={1950}, + publisher={The University of Chicago Press} +} + + +@article{blume2006if, + title={If you're so smart, why aren't you rich? Belief selection in complete and incomplete markets}, + author={Blume, Lawrence and Easley, David}, + journal={Econometrica}, + volume={74}, + number={4}, + pages={929--966}, + year={2006}, + publisher={Wiley Online Library} +} @article{mendoza1998international, title={The international ramifications of tax reforms: supply-side economics in a global economy}, diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md index 25fb252e5..2daa27157 100644 --- a/lectures/likelihood_ratio_process.md +++ b/lectures/likelihood_ratio_process.md @@ -37,8 +37,10 @@ Among things that we'll learn are * How a likelihood ratio process is a key ingredient in frequentist hypothesis testing * How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing -* How a statistician can combine frequentist probabilities of type I and type II errors to form posterior probabilities of erroneous model selection or missclassification of individuals -* How during World War II the United States Navy devised a decision rule that Captain Garret L. Schyler challenged, a topic to be studied in {doc}`this lecture ` +* How a statistician can combine frequentist probabilities of type I and type II errors to form posterior probabilities of mistakes in a model selection or a classification problem +* How likelihood ratios helped Lawrence Blume and David Easley formulate an answer to the question ''If you're so smart, why aren't you rich?'' {cite}`blume2006if` +* How to use a Kullback-Leibler divergence to quantify the difference between two probability distributions with the same support +* How during World War II the United States Navy devised a decision rule for doing quality control on lots of ammunition, a topic that sets the stage for {doc}`this lecture ` * A peculiar property of likelihood ratio processes @@ -70,7 +72,7 @@ that $q$ is either $f$ or $g$, permanently. Nature knows which density it permanently draws from, but we the observers do not. -We know both $f$ and $g$ but we don’t know which density nature +We know both $f$ and $g$ but we don't know which density nature chose. But we want to know. @@ -119,12 +121,12 @@ inferences using a classic frequentist approach due to Neyman and Pearson {cite}`Neyman_Pearson`. To help us appreciate how things work, the following Python code evaluates $f$ and $g$ as two different -beta distributions, then computes and simulates an associated likelihood +Beta distributions, then computes and simulates an associated likelihood ratio process by generating a sequence $w^t$ from one of the two probability distributions, for example, a sequence of IID draws from $g$. ```{code-cell} ipython3 -# Parameters in the two beta distributions. +# Parameters in the two Beta distributions. F_a, F_b = 1, 1 G_a, G_b = 3, 1.2 @@ -323,7 +325,7 @@ plt.show() We now describe how to employ the machinery of Neyman and Pearson {cite}`Neyman_Pearson` to test the hypothesis that history $w^t$ is generated by repeated -IID draws from density $g$. +IID draws from density $f$. Denote $q$ as the data generating process, so that $q=f \text{ or } g$. @@ -419,7 +421,7 @@ Below we plot some informative figures that illustrate this. We also present a classical frequentist method for choosing a sample size $t$. -Let’s start with a case in which we fix the threshold $c$ at +Let's start with a case in which we fix the threshold $c$ at $1$. ```{code-cell} ipython3 @@ -438,7 +440,7 @@ errors both decrease, which is good. This is because most of the probability mass of log$(L(w^t))$ moves toward $-\infty$ when $g$ is the data generating -process, ; while log$(L(w^t))$ goes to +process, while log$(L(w^t))$ goes to $\infty$ when data are generated by $f$. That disparate behavior of log$(L(w^t))$ under $f$ and $q$ @@ -506,7 +508,7 @@ If for a fixed $t$ we now free up and move $c$, we will sweep out the probabilit of detection as a function of the probability of false alarm. This produces a [receiver operating characteristic -curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic). +curve (ROC curve)](https://en.wikipedia.org/wiki/Receiver_operating_characteristic). Below, we plot receiver operating characteristic curves for different sample sizes $t$. @@ -536,6 +538,13 @@ Notice that as $t$ increases, we are assured a larger probability of detection and a smaller probability of false alarm associated with a given discrimination threshold $c$. +For a given sample size $t$, both $\alpha$ and $\beta$ change as we vary $c$. + +As we increase $c$ + +* $\alpha \equiv \Pr\left\{ L\left(w^{t}\right)c\mid q=g\right\}$ decreases + As $t \rightarrow + \infty$, we approach the perfect detection curve that is indicated by a right angle hinging on the blue dot. @@ -584,12 +593,12 @@ presented to Milton Friedman, as we describe in {doc}`this lecture K_f$, $f$ is closer to $h$ than $g$ -is. - -- In that case we’ll find that - $L\left(w^t\right) \rightarrow + \infty$ - -We’ll now experiment with an $h$ is also a beta distribution - -We’ll start by setting parameters $G_a$ and $G_b$ so that -$h$ is closer to $g$ - -```{code-cell} ipython3 -H_a, H_b = 3.5, 1.8 - -h = jit(lambda x: p(x, H_a, H_b)) -``` - -```{code-cell} ipython3 -x_range = np.linspace(0, 1, 100) -plt.plot(x_range, f(x_range), label='f') -plt.plot(x_range, g(x_range), label='g') -plt.plot(x_range, h(x_range), label='h') - -plt.legend() -plt.show() -``` ++++ -Let’s compute the Kullback–Leibler discrepancies by quadrature +Let's compute the Kullback–Leibler discrepancies by quadrature integration. ```{code-cell} ipython3 def KL_integrand(w, q, h): - m = q(w) / h(w) + m = h(w) / q(w) - return np.log(m) * q(w) + return np.log(m) * h(w) ``` ```{code-cell} ipython3 def compute_KL(h, f, g): + """ + Compute KL divergence with reference distribution h + """ Kf, _ = quad(KL_integrand, 0, 1, args=(f, h)) Kg, _ = quad(KL_integrand, 0, 1, args=(g, h)) @@ -672,58 +654,120 @@ def compute_KL(h, f, g): return Kf, Kg ``` -```{code-cell} ipython3 -Kf, Kg = compute_KL(h, f, g) -Kf, Kg -``` +### A helpful formula -We have $K_g < K_f$. +There is a mathematical relationship between likelihood ratios and KL divergence. -Next, we can verify our conjecture about $L\left(w^t\right)$ by -simulation. +When data is generated by distribution $h$, the expected log likelihood ratio is: -```{code-cell} ipython3 -l_arr_h = simulate(H_a, H_b) -l_seq_h = np.cumprod(l_arr_h, axis=1) -``` +$$ +\frac{1}{t} E_{h}\!\bigl[\log L_t\bigr] = KL(h, g) - KL(h, f) = K_g - K_f +$$ (eq:kl_likelihood_link) -The figure below plots over time the fraction of paths -$L\left(w^t\right)$ that fall in the interval $[0,0.01]$. +where $L_t=\prod_{j=1}^{t}\frac{f(w_j)}{g(w_j)}$ is the likelihood ratio process. -Notice that, as expected, it converges to 1 when $g$ is closer to -$h$ than $f$ is. +(For the proof, see [this note](https://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture7.pdf).) -```{code-cell} ipython3 -N, T = l_arr_h.shape -plt.plot(range(T), np.sum(l_seq_h <= 0.01, axis=0) / N) -``` +{eq}`eq:kl_likelihood_link` tells us that: +- When $K_g < K_f$ (i.e., $g$ is closer to $h$ than $f$ is), the expected log likelihood ratio is negative, so $L\left(w^t\right) \rightarrow 0$. +- When $K_g > K_f$ (i.e., $f$ is closer to $h$ than $g$ is), the expected log likelihood ratio is positive, so $L\left(w^t\right) \rightarrow + \infty$. -We can also try an $h$ that is closer to $f$ than is -$g$ so that now $K_g$ is larger than $K_f$. +Let's verify this using simulation. -```{code-cell} ipython3 -H_a, H_b = 1.2, 1.2 -h = jit(lambda x: p(x, H_a, H_b)) -``` +In the simulation, we generate multiple paths using Beta distributions $f$, $g$, and $h$, and compute the paths of $\log(L(w^t))$. -```{code-cell} ipython3 -Kf, Kg = compute_KL(h, f, g) -Kf, Kg -``` +We consider three cases: (1) $h$ is closer to $f$, (2) $f$ and $g$ are approximately equidistant from $h$, and (3) $h$ is closer to $g$. ```{code-cell} ipython3 -l_arr_h = simulate(H_a, H_b) -l_seq_h = np.cumprod(l_arr_h, axis=1) -``` +:tags: [hide-input] -Now probability mass of $L\left(w^t\right)$ falling above -$10000$ diverges to $+\infty$. +# Define test scenarios +scenarios = [ + { + "name": "KL(h,g) > KL(h,f)", + "h_params": (1.2, 1.1), + "expected": r"$L_t \to \infty$" + }, + { + "name": "KL(h,g) ≈ KL(h,f)", + "h_params": (2, 1.35), + "expected": "$L_t$ fluctuates" + }, + { + "name": "KL(h,g) < KL(h,f)", + "h_params": (3.5, 1.5), + "expected": r"$L_t \to 0$" + } +] -```{code-cell} ipython3 -N, T = l_arr_h.shape -plt.plot(range(T), np.sum(l_seq_h > 10000, axis=0) / N) +fig, axes = plt.subplots(2, 3, figsize=(15, 12)) + +for i, scenario in enumerate(scenarios): + # Define h + h = lambda x: p(x, scenario["h_params"][0], + scenario["h_params"][1]) + + # Compute KL divergences + Kf, Kg = compute_KL(h, f, g) + kl_diff = Kg - Kf + + # Simulate paths + N_paths = 100 + T = 150 + + # Generate data from h + h_data = np.random.beta(scenario["h_params"][0], + scenario["h_params"][1], (N_paths, T)) + l_ratios = f(h_data) / g(h_data) + l_cumulative = np.cumprod(l_ratios, axis=1) + log_l_cumulative = np.log(l_cumulative) + + # Plot distributions + ax = axes[0, i] + x_range = np.linspace(0.001, 0.999, 200) + ax.plot(x_range, [f(x) for x in x_range], + 'b-', label='f', linewidth=2) + ax.plot(x_range, [g(x) for x in x_range], + 'r-', label='g', linewidth=2) + ax.plot(x_range, [h(x) for x in x_range], + 'g--', label='h (data)', linewidth=2) + ax.set_xlabel('w') + ax.set_ylabel('density') + ax.set_title(scenario["name"], fontsize=16) + ax.legend() + + # Plot log likelihood ratio paths + ax = axes[1, i] + for j in range(min(20, N_paths)): + ax.plot(log_l_cumulative[j, :], alpha=0.3, color='purple') + + # Plot theoretical expectation + theory_line = kl_diff * np.arange(1, T+1) + ax.plot(theory_line, 'k--', linewidth=2, label=r'$t \times (K_g - K_f)$') + + ax.set_xlabel('t') + ax.set_ylabel('$log L_t$') + ax.set_title(f'KL(h,f)={Kf:.3f}, KL(h,g)={Kg:.3f}\n{scenario["expected"]}', + fontsize=16) + ax.legend(fontsize=16) + +plt.tight_layout() +plt.show() ``` +Note that + +- In the first figure, $\log L(w^t)$ diverges to $\infty$ because $K_g > K_f$. +- In the second figure, we still have $K_g > K_f$, but the difference is smaller, so $L(w^t)$ diverges to infinity at a slower pace. +- In the last figure, $\log L(w^t)$ diverges to $-\infty$ because $K_g < K_f$. +- The black dotted line, $t \left(KL(h,g) - KL(h, f)\right)$, closely fits the paths verifying {eq}`eq:kl_likelihood_link`. + +These observations align with the theory. + +In a [later section](hetero_agent), we will see an application of these ideas. + ++++ + ## Hypothesis Testing and Classification We now describe how a statistician can combine frequentist probabilities of type I and type II errors in order to @@ -746,8 +790,7 @@ We assume that $f$ and $g$ both put positive probabilities on the same intervals -In the simulations below, we specify that $f$ is a $\text{Beta}(1, 1)$ distribution and that $g$ is $\text{Beta}(3, 1.2)$ distribution, -just as we did often earlier in this lecture. +In the simulations below, we specify that $f$ is a $\text{Beta}(1, 1)$ distribution and that $g$ is $\text{Beta}(3, 1.2)$ distribution. We consider two alternative timing protocols. @@ -993,7 +1036,7 @@ plt.tight_layout() plt.show() ``` -To the left of the green vertical line $f < g $, so $l_t < 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $g$ individual. +To the left of the green vertical line $g < f$, so $l_t < 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $g$ individual. * The shaded orange area equals $\beta$ -- the probability of classifying someone as a type $g$ individual when it is really a type $f$ individual. @@ -1182,6 +1225,8 @@ where $m = \frac{1}{2}(f+g)$ is a mixture of $f$ and $g$. ```{note} We studied KL divergence in the [section above](rel_entropy) with respect to a reference distribution $h$. +Recall that KL divergence $KL(f, g)$ measures expected excess surprisal from using misspecified model $g$ instead $f$ when $f$ is the true model. + Because in general $KL(f, g) \neq KL(g, f)$, KL divergence is not symmetric, but Jensen-Shannon divergence is symmetric. (In fact, the square root of the Jensen-Shannon divergence is a metric referred to as the Jensen-Shannon distance.) @@ -1189,7 +1234,7 @@ Because in general $KL(f, g) \neq KL(g, f)$, KL divergence is not symmetric, but As {eq}`eq:js_divergence` shows, the Jensen-Shannon divergence computes average of the KL divergence of $f$ and $g$ with respect to a particular reference distribution $m$ defined below the equation. ``` -Now let's create a comparison table showing KL divergence, Jensen-Shannon divergence, and Chernoff entropy +Now let's create a comparison table showing KL divergence, Jensen-Shannon divergence, and Chernoff entropy for a set of pairs of Beta distributions. ```{code-cell} ipython3 def js_divergence(f, g): @@ -1221,6 +1266,12 @@ def kl_divergence(f, g): distribution_pairs = [ # (f_params, g_params) + ((1, 1), (0.1, 0.2)), + ((1, 1), (0.3, 0.3)), + ((1, 1), (0.3, 0.4)), + ((1, 1), (0.5, 0.5)), + ((1, 1), (0.7, 0.6)), + ((1, 1), (0.9, 0.8)), ((1, 1), (1.1, 1.05)), ((1, 1), (1.2, 1.1)), ((1, 1), (1.5, 1.2)), @@ -1228,10 +1279,7 @@ distribution_pairs = [ ((1, 1), (2.5, 1.8)), ((1, 1), (3, 1.2)), ((1, 1), (4, 1)), - ((1, 1), (5, 1)), - ((2, 2), (4, 2)), - ((2, 2), (6, 2)), - ((2, 2), (8, 2)), + ((1, 1), (5, 1)) ] # Create comparison table @@ -1304,8 +1352,8 @@ def plot_dist_diff(): ((1, 1), (1.5, 1.2)), ((1, 1), (2, 1.5)), ((1, 1), (3, 1.2)), - ((1, 1), (5, 1)), - ((2, 2), (8, 2)), + ((1, 1), (5, 1)), + ((1, 1), (0.3, 0.3)) ] fig, axes = plt.subplots(3, 2, figsize=(15, 12)) @@ -1372,9 +1420,15 @@ In the simulation below, nature draws $N / 2$ sequences from $g$ and $N/2$ seque ```{note} Nature does this rather than flipping a fair coin to decide whether to draw from $g$ or $f$ once and for all before each simulation of length $T$. -``` +``` ```{code-cell} ipython3 +def compute_likelihood_ratio_stats(sequences, f, g): + """Compute likelihood ratios and cumulative products.""" + l_ratios = f(sequences) / g(sequences) + L_cumulative = np.cumprod(l_ratios, axis=1) + return l_ratios, L_cumulative + def error_divergence_cor(): """ compute correlation between error probabilities and divergence measures. @@ -1409,13 +1463,15 @@ def error_divergence_cor(): sequences_f = np.random.beta(f_a, f_b, (N_half, T_large)) sequences_g = np.random.beta(g_a, g_b, (N_half, T_large)) - # Compute likelihood ratios - l_ratios_f = f(sequences_f) / g(sequences_f) - l_ratios_g = f(sequences_g) / g(sequences_g) - # Compute cumulative products - L_cumulative_f = np.cumprod(l_ratios_f, axis=1)[:, -1] - L_cumulative_g = np.cumprod(l_ratios_g, axis=1)[:, -1] + + # Compute likelihood ratios and cumulative products + _, L_cumulative_f = compute_likelihood_ratio_stats(sequences_f, f, g) + _, L_cumulative_g = compute_likelihood_ratio_stats(sequences_g, f, g) + + # Get final values + L_cumulative_f = L_cumulative_f[:, -1] + L_cumulative_g = L_cumulative_g[:, -1] # Calculate error probabilities error_probs[i] = 0.5 * (np.mean(L_cumulative_f < 1) + @@ -1453,7 +1509,7 @@ def plot_error_divergence(data): # Create figure and axes fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) - # Helper function for plotting correlation + # function for plotting correlation def plot_correlation(ax, x_vals, x_label, color): ax.scatter(x_vals, log_error, alpha=0.7, s=60, color=color) ax.set_xlabel(x_label) @@ -1461,7 +1517,7 @@ def plot_error_divergence(data): # Calculate correlation and trend line corr = np.corrcoef(x_vals, log_error)[0, 1] - z = np.polyfit(x_vals, log_error, 1) + z = np.polyfit(x_vals, log_error, 2) x_trend = np.linspace(x_vals.min(), x_vals.max(), 100) ax.plot(x_trend, np.poly1d(z)(x_trend), "r--", alpha=0.8, linewidth=2) @@ -1484,6 +1540,594 @@ We'll encounter related ideas in {doc}`wald_friedman`. +++ +(hetero_agent)= +## Consumption and Heterogeneous Beliefs + +A likelihood ratio process lies behind Lawrence Blume and David Easley's answer to their question +''If you're so smart, why aren't you rich?'' {cite}`blume2006if`. + +Here we'll provide an example that illustrates basic components of their analysis. + +Let the random variable $s_t \in (0,1)$ at time $t =0, 1, 2, \ldots$ be distributed according to the same Beta distribution with parameters +$\theta = [\theta_1, \theta_2]$. + +We'll denote this probability density as + +$$ +\pi(s_t|\theta) +$$ + +Below, we'll often just write $\pi(s_t)$ instead of $\pi(s_t|\theta)$ to save space. + +Let $s_t \equiv y_t^1$ be the endowment of a nonstorable consumption good that a person we'll call "agent 1" receives at time $t$. + +Let a history $s^t = [s_t, s_{t-1}, \ldots, s_0]$ be a sequence of i.i.d. random variables with joint distribution + +$$ +\pi_t(s^t) = \pi(s_t) \pi(s_{t-1}) \cdots \pi(s_0) +$$ + +So in our example, the history $s^t$ is a comprehensive record of agent $1$'s endowments of the consumption good from time $0$ up to time $t$. + +If agent $1$ were to lives on an island by himself, agent $1$'s consumption $c^1(s_t)$ at time $t$ is + +$$c^1(s_t) = y_t^1 = s_t. $$ + +But in our model, agent 1 is not alone. + +### Nature and beliefs + +Nature draws i.i.d. sequences $\{s_t\}_{t=0}^\infty$ from $\pi_t(s^t)$. + +* so $\pi$ without a superscript is nature's model +* but in addition to nature, there are other entities inside our model -- artificial people that we call "agents" +* each agent has a sequence of probability distributions over $s^t$ for $t=0, \ldots$ +* agent $i$ thinks that nature draws i.i.d. sequences $\{s_t\}_{t=0}^\infty$ from $\pi_t^i(s^t)$ + * agent $i$ is mistaken unless $\pi_t^i(s^t) = \pi_t(s^t)$ + +```{note} +A **rational expectations** model would set $\pi_t^i(s^t) = \pi_t(s^t)$ for all agents $i$. +``` + + + +There are two agents named $i=1$ and $i=2$. + +At time $t$, agent $1$ receives an endowment + +$$ +y_t^1 = s_t +$$ + +of a nonstorable consumption good, while agent $2$ receives an endowment of + + +$$ +y_t^2 = 1 - s_t +$$ + +The aggregate endowment of the consumption good is + +$$ +y_t^1 + y_t^2 = 1 +$$ + +at each date $t \geq 0$. + +At date $t$ agent $i$ consumes $c_t^i(s^t)$ of the good. + +A (non wasteful) feasible allocation of the aggregate endowment of $1$ each period satisfies + +$$ +c_t^1 + c_t^2 = 1 . +$$ + +### A social risk-sharing arrangement + +In order to share risks, a benevolent social planner will dictate a a history-dependent consumption allocation in the form of a sequence of functions + +$$ +c_t^i = c_t^i(s^t) +$$ + +that satisfy + +$$ +c_t^1(s^t) + c_t^2(s^t) = 1 +$$ (eq:feasibility) + +for all $s^t$ for all $t \geq 0$. + +To design a socially optimal allocation, the social planner wants to know what agents $1$ believe about the endowment sequence and how they feel about bearing risks. + +As for the endowment sequences, agent $i$ believes that nature draws i.i.d. sequences from joint densities + +$$ +\pi_t^i(s^t) = \pi(s_t)^i \pi^i(s_{t-1}) \cdots \pi^i(s_0) +$$ + +As for attitudes toward bearing risks, agent $i$ has a one-period utility function + +$$ +u(c_t^i) = u(c_t^i) = \ln (c_t^i) +$$ + +with marginal utility of consumption in period $i$ + +$$ +u'(c_t^i) = \frac{1}{c_t^i} +$$ + +Putting its beliefs about its random endowment sequence and its attitudes toward bearing risks together, agent $i$ has intertemporal utility function + +$$ +V^i = \sum_{t=0}^{\infty} \sum_{s^t} \delta^t u(c_t^i(s^t)) \pi_t^i(s^t) , +$$ (eq:objectiveagenti) + +where $\delta \in (0,1)$ is an intertemporal discount factor, and $u(\cdot)$ is a strictly increasing, concave one-period utility function. + + +### The social planner's allocation problem + +The benevolent dictator has all the information it requires to choose a consumption allocation that maximizes the social welfare criterion + +$$ +W = \lambda V^1 + (1-\lambda) V^2 +$$ (eq:welfareW) + +where $\lambda \in [0,1]$ is a Pareto weight tells how much the planner likes agent $1$ and $1 - \lambda$ is a Pareto weight that tells how much the socical planner likes agent $2$. + +Setting $\lambda = .5$ expresses ''egalitarian'' social preferences. + +Notice how social welfare criterion {eq}`eq:welfareW` takes into account both agent's preferences as represented by formula {eq}`eq:objectiveagenti`. + +This means that the social planner knows and respects + +* the one period utility function $u(\cdot) = \ln(\cdot)$ +* each agent $i$'s probability model $\{\pi_t^i(s^t)\}_{t=0}^\infty$ + +Consequently, we anticipate that these objects will appear in the social planner's rule for allocating the aggregate endowment each period. + + +First-order necessary conditions for maximizing welfare criterion {eq}`eq:welfareW` subject to the feasibility constraint {eq}`eq:feasibility` are + +$$\frac{\pi_t^2(s^t)}{\pi_t^1(s^t)} \frac{(1/c_t^2(s^t))}{(1/c_t^1(s^t))} = \frac{\lambda}{1 -\lambda}$$ + +which can be rearranged to become + + + + +$$ +\frac{c_t^1(s^t)}{c_t^2(s^t)} = \frac{\lambda}{1- \lambda} l_t(s^t) +$$ (eq:allocationrule0) + + +where + +$$ l_t(s^t) = \frac{\pi_t^1(s^t)}{\pi_t^2(s^t)} $$ + +is the likelihood ratio of agent 1's joint density to agent 2's joint density. + +Using + +$$c_t^1(s^t) + c_t^2(s^t) = 1$$ + +we can rewrite allocation rule {eq}`eq:allocationrule0` as + + + +$$\frac{c_t^1(s^t)}{1 - c_t^1(s^t)} = \frac{\lambda}{1-\lambda} l_t(s^t)$$ + +or + +$$c_t^1(s^t) = \frac{\lambda}{1-\lambda} l_t(s^t)(1 - c_t^1(s^t))$$ + +which implies that the social planner's allocation rule is + +$$ +c_t^1(s^t) = \frac{\lambda l_t(s^t)}{1-\lambda + \lambda l_t(s^t)} +$$ (eq:allocationrule1) + +If we define a temporary or **continuation Pareto weight** process as + +$$ +\lambda_t(s^t) = \frac{\lambda l_t(s^t)}{1-\lambda + \lambda l_t(s^t)}, +$$ + +then we can represent the social planner's allocation rule as + +$$ +c_t^1(s^t) = \lambda_t(s^t) . +$$ + + + + +### If you're so smart, $\ldots$ + + +Let's compute some values of limiting allocations {eq}`eq:allocationrule1` for some interesting possible limiting +values of the likelihood ratio process $l_t(s^t)$: + + $$l_\infty (s^\infty)= 1; \quad c_\infty^1 = \lambda$$ + + * In the above case, both agents are equally smart (or equally not smart) and the consumption allocation stays put at a $\lambda, 1 - \lambda $ split between the two agents. + +$$l_\infty (s^\infty) = 0; \quad c_\infty^1 = 0$$ + +* In the above case, agent 2 is smarter than agent 1, and agent 1's share of the aggregate endowment converges to zero. + + + +$$l_\infty (s^\infty)= \infty; \quad c_\infty^1 = 1$$ + +* In the above case, agent 1 is smarter than agent 2, and agent 1's share of the aggregate endowment converges to 1. + + +Soon we'll do some simulations that will shed further light on possible outcomes. + +But before we do that, let's take a detour and study some "shadow prices" for the social planning problem that can readily be +converted to "equilibrium prices" for a competitive equilibrium. + +Doing this will allow us to connect our analysis with an argument of {cite}`alchian1950uncertainty` and {cite}`friedman1953essays` that competitive market processes can make prices of risky assets better reflect realistic probability assessments. + + + +### Competitive Equilibrium Prices + +The two fundamental welfare theorems for general equilibrium models lead us to anticipate that there is a connection between the allocation that solves the social planning problem we have been studying and the allocation in a **competitive equilibrium** with complete markets in history-contingent commodities. + +```{note} +For the two welfare theorems and their history, see . +``` + +Such a connection prevails for our model. + +We'll sketch it now. + +In a competitive equilibrium, there is no social planner that dictatorially collects everybody's endowments and then reallocates them. + +Instead, there is a comprehensive centralized market that meets at one point in time. + +There are **prices** at which price-taking agents can buy or sell whatever goods that they want. + +Trade is multilateral in the sense that all that there is a "Walrasian auctioneer" who lives outside the model and whose job is to verify that +each agent's budget constraint is satisfied. + +That budget constraint involves the total value of the agent's endowment stream and the total value of its consumption stream. + +Suppose that at time $-1$, before time $0$ starts, agent $i$ can purchase one unit $c_t(s^t)$ of consumption at time $t$ after history +$s^t$ at price $p_t(s^t)$. + +Notice that there is (very long) **vector** of prices. + +We want to study how agents' diverse beliefs influence equilibrium prices. + +Agent $i$ faces a **single** intertemporal budget constraint + +$$ +\sum_{t=0}\sum_{s^t} p_t(s^t) c_t^i (y_t(s^t)) \leq \sum_{t=0}\sum_{s^t} p_t(s^t) y_t^i (y_t(s^t)) +$$ (eq:budgetI) + +Agent $i$ puts a Lagrange multiplier $\mu^i$ on {eq}`eq:budgetI` and once-and-for-all chooses a consumption plan $\{c^i_t(s^t)\}_{t=0}^\infty$ +to maximize criterion {eq}`eq:objectiveagenti` subject to budget constraint {eq}`eq:budgetI`. + +```{note} +For convenience, let's remind ourselves of criterion {eq}`eq:objectiveagenti`: +$ +V^i = \sum_{t=0}^{\infty} \sum_{s^t} \delta^t u_t(c_t^i(s^t)) \pi_t^i(s^t)$ +``` + +First-order conditions for maximizing with respect to $c_t^i(s^t)$ are + +$$ +\delta^t u'(c^i(s^t)) \pi_t^i(s^t) = \mu_i p_t(s^t) , +$$ + +which we can rearrange to obtain + +$$ +p_t(s^t) = \frac{ \delta^t \pi_t^i(s^t)}{\mu^i c^i(s^t)} +$$ (eq:priceequation1) + +for $i=1,2$. + +If we divide equation {eq}`eq:priceequation1` for agent $1$ by the appropriate version of equation {eq}`eq:priceequation1` for agent 2, use +$c^2_t(s^t) = 1 - c^1_t(s^t)$, and do some algebra, we'll obtain + +$$ +c_t^1(s^t) = \frac{\mu_1 l_t(s^t)}{\mu_2 + \mu_1 l_t(s^t)} . +$$ (eq:allocationce) + +We now engage in an extended "guess-and-verify" exercise that involves matching objects in our competitive equilibrium with objects in +our social planning problem. + +* we'll match consumption allocations in the planning problem with equilibrium consumption allocations in the competitive equilibrium +* we'll match "shadow" prices in the planning problem with competitive equilibrium prices. + +Notice that if we set $\mu_1 = \lambda$ and $\mu_2 = 1 -\lambda$, then formula {eq}`eq:allocationce` agrees with formula +{eq}`eq:allocationrule1`. + + * doing this amounts to choosing a **numeraire** or normalization for the price system $\{p_t(s^t)\}_{t=0}^\infty$ + +```{note} +For information about how a numeraire must be chosen to pin down the absolute price level in a model like ours that determines only +relative prices, see . +``` + +If we substitute formula {eq}`eq:allocationce` for $c_t^1(s^t)$ into formula {eq}`eq:priceequation1` and rearrange, we obtain + +$$ +p_t(s^t) = \frac{\delta^t \pi_t^2(s^t)}{1 - \lambda + \lambda l_t(s^t)} +$$ (eq:pformulafinal) + +According to formula {eq}`eq:pformulafinal`, we have the following possible limiting cases: + +* when $l_\infty = 0$, $c_\infty^2 = 0 $ and tails of competitive equilibrium prices reflect agent $2$'s probability model $\pi_t^2(s^t)$ +* when $l_\infty = 1$, $c_\infty^1 = 0 $ and tails competitive equilibrium prices reflect agent $1$'s probability model $\pi_t^2(s^t)$ +* for small $t$'s, competitive equilbrium prices reflect both agents' probability models. + +### Simulations + +Now let's implement some simulations when agent $1$ believes marginal density + +$$\pi^1(s_t) = f(s_t) $$ + +and agent $2$ believes marginal density + +$$ \pi^2(s_t) = g(s_t) $$ + +where $f$ and $g$ are Beta distributions like ones that we used in earlier sections of this lecture. + +Meanwhile, we'll assume that nature believes a marginal density + +$$ +\pi(s_t) = h(s_t) +$$ + +where $h(s_t)$ is perhaps a mixture of $f$ and $g$. + +Let's write a Python function that computes agent 1's consumption share + +```{code-cell} ipython3 +def simulate_blume_easley(sequences, f_belief=f, g_belief=g, λ=0.5): + """Simulate Blume-Easley model consumption shares.""" + l_ratios, l_cumulative = compute_likelihood_ratio_stats(sequences, f_belief, g_belief) + c1_share = λ * l_cumulative / (1 - λ + λ * l_cumulative) + return l_cumulative, c1_share +``` + +Now let's use this function to generate sequences in which + +* nature draws from $f$ each period, or +* nature draws from $g$ each period, or +* or nature flips a fair coin each period to decide whether to draw from $f$ or $g$ + +```{code-cell} ipython3 +λ = 0.5 +T = 100 +N = 10000 + +# Nature follows f, g, or mixture +s_seq_f = np.random.beta(F_a, F_b, (N, T)) +s_seq_g = np.random.beta(G_a, G_b, (N, T)) + +h = jit(lambda x: 0.5 * f(x) + 0.5 * g(x)) +model_choices = np.random.rand(N, T) < 0.5 +s_seq_h = np.empty((N, T)) +s_seq_h[model_choices] = np.random.beta(F_a, F_b, size=model_choices.sum()) +s_seq_h[~model_choices] = np.random.beta(G_a, G_b, size=(~model_choices).sum()) + +l_cum_f, c1_f = simulate_blume_easley(s_seq_f) +l_cum_g, c1_g = simulate_blume_easley(s_seq_g) +l_cum_h, c1_h = simulate_blume_easley(s_seq_h) +``` + +Before looking at the figure below, have some fun by guessing whether agent 1 or agent 2 will have a larger and larger consumption share as time passes in our three cases. + +To make better guesses, let's visualize instances of the likelihood ratio processes in the three cases. + +```{code-cell} ipython3 +fig, axes = plt.subplots(2, 3, figsize=(15, 10)) + +titles = ["Nature = f", "Nature = g", "Nature = mixture"] +data_pairs = [(l_cum_f, c1_f), (l_cum_g, c1_g), (l_cum_h, c1_h)] + +for i, ((l_cum, c1), title) in enumerate(zip(data_pairs, titles)): + # Likelihood ratios + ax = axes[0, i] + for j in range(min(50, l_cum.shape[0])): + ax.plot(l_cum[j, :], alpha=0.3, color='blue') + ax.set_yscale('log') + ax.set_xlabel('time') + ax.set_ylabel('Likelihood ratio $l_t$') + ax.set_title(title) + ax.axhline(y=1, color='red', linestyle='--', alpha=0.5) + + # Consumption shares + ax = axes[1, i] + for j in range(min(50, c1.shape[0])): + ax.plot(c1[j, :], alpha=0.3, color='green') + ax.set_xlabel('time') + ax.set_ylabel("Agent 1's consumption share") + ax.set_ylim([0, 1]) + ax.axhline(y=λ, color='red', linestyle='--', alpha=0.5) + +plt.tight_layout() +plt.show() +``` + +In the left panel, nature chooses $f$. Agent 1's consumption reaches $1$ very quickly. + +In the middle panel, nature chooses $g$. Agent 1's consumption ratio tends to move towards $0$ but not as fast as in the first case. + +In the right panel, nature flips coins each period. We see a very similar pattern to the processes in the left panel. + +The figures in the top panel remind us of the discussion in [this section](rel_entropy). + +We invite readers to revisit [that section](rel_entropy) and try to infer the relationships among $KL(f, g)$, $KL(g, f)$, $KL(h, f)$, and $KL(h,g)$. + + +Let's compute values of KL divergence + +```{code-cell} ipython3 +shares = [np.mean(c1_f[:, -1]), np.mean(c1_g[:, -1]), np.mean(c1_h[:, -1])] +Kf_g, Kg_f = kl_divergence(f, g), kl_divergence(g, f) +Kf_h, Kg_h = compute_KL(h, f, g) + +print(f"Final shares: f={shares[0]:.3f}, g={shares[1]:.3f}, mix={shares[2]:.3f}") +print(f"KL divergences: \nKL(f,g)={Kf_g:.3f}, KL(g,f)={Kg_f:.3f}") +print(f"KL(h,f)={Kf_h:.3f}, KL(h,g)={Kg_h:.3f}") +``` + +We find that $KL(f,g) > KL(g,f)$ and $KL(h,g) > KL(h,f)$. + +The first inequality tells us that the average "surprise" or "inefficiency" of using belief $g$ when nature chooses $f$ is greater than the "surprise" of using belief $f$ when nature chooses $g$. + +The second inequality tells us that agent 1's belief distribution $f$ is closer to nature's pick than agent 2's belief $g$. + ++++ + +To make this idea more concrete, let's compare two cases: + +- agent 1's belief distribution $f$ is close to agent 2's belief distribution $g$; +- agent 1's belief distribution $f$ is far from agent 2's belief distribution $g$. + + +We use the two distributions visualized below + +```{code-cell} ipython3 +def plot_distribution_overlap(ax, x_range, f_vals, g_vals, + f_label='f', g_label='g', + f_color='blue', g_color='red'): + """Plot two distributions with their overlap region.""" + ax.plot(x_range, f_vals, color=f_color, linewidth=2, label=f_label) + ax.plot(x_range, g_vals, color=g_color, linewidth=2, label=g_label) + + overlap = np.minimum(f_vals, g_vals) + ax.fill_between(x_range, 0, overlap, alpha=0.3, color='purple', label='Overlap') + ax.set_xlabel('x') + ax.set_ylabel('Density') + ax.legend() + +# Define close and far belief distributions +f_close = jit(lambda x: p(x, 1, 1)) +g_close = jit(lambda x: p(x, 1.1, 1.05)) + +f_far = jit(lambda x: p(x, 1, 1)) +g_far = jit(lambda x: p(x, 3, 1.2)) + +js_close = js_divergence(f_close, g_close) +js_far = js_divergence(f_far, g_far) + +# Visualize the belief distributions +fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5)) + +x_range = np.linspace(0.001, 0.999, 200) + +# Close beliefs +f_close_vals = [f_close(x) for x in x_range] +g_close_vals = [g_close(x) for x in x_range] +plot_distribution_overlap(ax1, x_range, f_close_vals, g_close_vals, + f_label='f (Beta(1, 1))', g_label='g (Beta(1.1, 1.05))') +ax1.set_title(f'Close Beliefs\nJS divergence={js_close:.4f}') + +# Far beliefs +f_far_vals = [f_far(x) for x in x_range] +g_far_vals = [g_far(x) for x in x_range] +plot_distribution_overlap(ax2, x_range, f_far_vals, g_far_vals, + f_label='f (Beta(1, 1))', g_label='g (Beta(3, 1.2))') +ax2.set_title(f'Far Beliefs\nJS divergence={js_far:.4f}') + +plt.tight_layout() +plt.show() +``` + +Let's draw the same consumption ratio plots as above for agent 1. + +We replace the simulation paths with median and percentiles to make the figure cleaner. + +Staring at the figure below, can we infer the relation between $KL(f,g)$ and $KL(g,f)$? + +From the right panel, can we infer the relation between $KL(h,g)$ and $KL(h,f)$? + +```{code-cell} ipython3 +fig, axes = plt.subplots(2, 3, figsize=(15, 10)) +nature_params = {'close': [(1, 1), (1.1, 1.05), (2, 1.5)], + 'far': [(1, 1), (3, 1.2), (2, 1.5)]} +nature_labels = ["Nature = f", "Nature = g", "Nature = h"] +colors = {'close': 'blue', 'far': 'red'} + +threshold = 1e-5 # “close to zero” cutoff + +for row, (f_belief, g_belief, label) in enumerate([ + (f_close, g_close, 'close'), + (f_far, g_far, 'far')]): + + for col, nature_label in enumerate(nature_labels): + params = nature_params[label][col] + s_seq = np.random.beta(params[0], params[1], (1000, 200)) + _, c1 = simulate_blume_easley(s_seq, f_belief, g_belief, λ) + + median_c1 = np.median(c1, axis=0) + p10, p90 = np.percentile(c1, [10, 90], axis=0) + + ax = axes[row, col] + color = colors[label] + ax.plot(median_c1, color=color, linewidth=2, label='Median') + ax.fill_between(range(len(median_c1)), p10, p90, alpha=0.3, color=color, label='10–90%') + ax.set_xlabel('time') + ax.set_ylabel("Agent 1's share") + ax.set_ylim([0, 1]) + ax.set_title(nature_label) + ax.axhline(y=λ, color='gray', linestyle='--', alpha=0.5) + below = np.where(median_c1 < threshold)[0] + above = np.where(median_c1 > 1-threshold)[0] + if below.size > 0: first_zero = (below[0], True) + elif above.size > 0: first_zero = (above[0], False) + else: first_zero = None + if first_zero is not None: + ax.axvline(x=first_zero[0], color='black', linestyle='--', + alpha=0.7, + label=fr'Median $\leq$ {threshold}' if first_zero[1] + else fr'Median $\geq$ 1-{threshold}') + ax.legend() + +plt.tight_layout() +plt.show() +``` + +Holding to our guesses, let's calculate the four values + +```{code-cell} ipython3 +# Close case +Kf_g, Kg_f = kl_divergence(f_close, g_close), kl_divergence(g_close, f_close) +Kf_h, Kg_h = compute_KL(h, f_close, g_close) + +print(f"KL divergences (close): \nKL(f,g)={Kf_g:.3f}, KL(g,f)={Kg_f:.3f}") +print(f"KL(h,f)={Kf_h:.3f}, KL(h,g)={Kg_h:.3f}") + +# Far case +Kf_g, Kg_f = kl_divergence(f_far, g_far), kl_divergence(g_far, f_far) +Kf_h, Kg_h = compute_KL(h, f_far, g_far) + +print(f"KL divergences (far): \nKL(f,g)={Kf_g:.3f}, KL(g,f)={Kg_f:.3f}") +print(f"KL(h,f)={Kf_h:.3f}, KL(h,g)={Kg_h:.3f}") +``` + +We find that in the first case, $KL(f,g) \approx KL(g,f)$ and both are relatively small, so although either agent 1 or agent 2 will eventually consume everything, convergence displaying in first two panels on the top is pretty slowly. + +In the first two panels at the bottom, we see convergence occurring faster (as indicated by the black dashed line) because the divergence gaps $KL(f, g)$ and $KL(g, f)$ are larger. + +Since $KL(f,g) > KL(g,f)$, we see faster convergence in the first panel at the bottom when nature chooses $f$ than in the second panel where nature chooses $g$. + +This ties in nicely with {eq}`eq:kl_likelihood_link`. + + ++++ + + ## Related Lectures Likelihood processes play an important role in Bayesian learning, as described in {doc}`likelihood_bayes` diff --git a/lectures/mccall_model.md b/lectures/mccall_model.md index 37ede1e4a..3298eec65 100644 --- a/lectures/mccall_model.md +++ b/lectures/mccall_model.md @@ -369,17 +369,29 @@ We are going to use Numba to accelerate our code. * See, in particular, the discussion of `@jitclass` in [our lecture on Numba](https://python-programming.quantecon.org/numba.html). -The following helps Numba by providing some type +The following helps Numba by providing some type specifications. ```{code-cell} python3 mccall_data = [ ('c', float64), # unemployment compensation ('β', float64), # discount factor - ('w', float64[:]), # array of wage values, w[i] = wage at state i - ('q', float64[:]) # array of probabilities + ('w', float64[::1]), # array of wage values, w[i] = wage at state i + ('q', float64[::1]) # array of probabilities ] ``` +```{note} +Note the use of `[::1]` in the array type declarations above. + +This notation specifies that the arrays should be C-contiguous. + +This is important for performance, especially when using the `@` operator for matrix multiplication (e.g., `v @ q`). + +Without this specification, Numba might need to handle non-contiguous arrays, which can significantly slow down these operations. + +Try to replace `[::1]` with `[:]` and see what happens. +``` + Here's a class that stores the data and computes the values of state-action pairs, i.e. the value in the maximum bracket on the right hand side of the Bellman equation {eq}`odu_pv2p`, given the current state and an arbitrary feasible action. diff --git a/lectures/mccall_q.md b/lectures/mccall_q.md index 1be450e6e..4d7b23d5d 100644 --- a/lectures/mccall_q.md +++ b/lectures/mccall_q.md @@ -149,8 +149,8 @@ Then we'll plot various iterates on the Bellman operator. mccall_data = [ ('c', float64), # unemployment compensation ('β', float64), # discount factor - ('w', float64[:]), # array of wage values, w[i] = wage at state i - ('q', float64[:]), # array of probabilities + ('w', float64[::1]), # array of wage values, w[i] = wage at state i + ('q', float64[::1]) # array of probabilities ]