QuantEcon · Copilot · Aug 7, 2025 · Aug 7, 2025 · Aug 7, 2025 · Aug 7, 2025
diff --git a/lectures/aiyagari.md b/lectures/aiyagari.md
@@ -71,7 +71,7 @@ A textbook treatment is available in chapter 18 of {cite}`Ljungqvist2012`.
 
 A continuous time version of the model by SeHyoun Ahn and Benjamin Moll can be found [here](https://nbviewer.org/github/QuantEcon/QuantEcon.notebooks/blob/master/aiyagari_continuous_time.ipynb).
 
-## The Economy
+## The economy
 
 ### Households
 

diff --git a/lectures/ak2.md b/lectures/ak2.md
@@ -173,7 +173,7 @@ $$
 
 
 
-## Activities in Factor Markets
+## Activities in factor markets
 
 **Old people:**  At each  $t \geq 0$, a representative  old person 
 
@@ -196,7 +196,7 @@ If a lump-sum tax is negative, it means that the government pays the person a su
 ``` 
 
 
-## Representative firm's problem 
+## Representative firm's problem
 
 The representative firm hires labor services from  young people  at competitive wage  rate $W_t$  and hires  capital from old  people at competitive rental rate
 $r_t$. 
@@ -319,7 +319,7 @@ $$ (eq:optsavingsplan)
 
 
 (sec-equilibrium)=
-## Equilbrium 
+## Equilbrium
 
 **Definition:** An equilibrium is an allocation,  a government policy, and a price system with the properties that
 * given the price system and the government policy, the allocation solves
@@ -687,7 +687,7 @@ closed = ClosedFormTrans(α, β)
 ```
 
 (exp-tax-cut)=
-### Experiment 1: Tax cut
+### Experiment 1: tax cut
 
 To illustrate the power of `ClosedFormTrans`, let's first experiment with the following fiscal policy change:
 
@@ -788,7 +788,7 @@ for i, name in enumerate(['τ', 'D', 'G']):
 The economy with lower tax cut rate at $t=0$ has the same transitional pattern, but is less distorted, and it converges to a new steady state with higher physical capital stock.
 
 (exp-expen-cut)=
-### Experiment 2: Government asset accumulation
+### Experiment 2: government asset accumulation
 
 Assume that the economy is initially in the same steady state.
 
@@ -832,7 +832,7 @@ Although the consumptions in the new steady state are strictly higher, it is at
 ``` 
 
 
-### Experiment 3: Temporary expenditure cut
+### Experiment 3: temporary expenditure cut
 
 Let's now investigate a   scenario in which  the government also cuts its spending by  half and accumulates the asset.
 
@@ -1207,7 +1207,7 @@ for i, name in enumerate(['τ', 'D', 'G']):
 Comparing to {ref}`exp-tax-cut`, the government raises lump-sum taxes to finance the increasing debt interest payment, which is less distortionary comparing to raising the capital income tax rate.
 
 
-### Experiment 4: Unfunded Social Security System
+### Experiment 4: unfunded social security system
 
 In this experiment,  lump-sum taxes are of equal magnitudes for old and the young, but of opposite signs.
 

diff --git a/lectures/ar1_bayes.md b/lectures/ar1_bayes.md
@@ -178,7 +178,7 @@ Now we shall use Bayes' law to construct a posterior distribution, conditioning
 
 First we'll use **pymc4**.
 
-## PyMC Implementation
+## PyMC implementation
 
 For a normal distribution in `pymc`,
 $var = 1/\tau = \sigma^{2}$.
@@ -292,7 +292,7 @@ We'll return to this issue after we use `numpyro` to compute posteriors under ou
 
 We'll now repeat the calculations using `numpyro`.
 
-## Numpyro Implementation
+## Numpyro implementation
 
 ```{code-cell} ipython3
 

diff --git a/lectures/ar1_turningpts.md b/lectures/ar1_turningpts.md
@@ -57,7 +57,7 @@ logger = logging.getLogger('pymc')
 logger.setLevel(logging.CRITICAL)
 ```
 
-## A Univariate First-Order Autoregressive Process
+## A univariate first-order autoregressive process
 
 Consider the univariate AR(1) model: 
 
@@ -185,7 +185,7 @@ As functions of forecast horizon, the coverage intervals have shapes like those
 https://python.quantecon.org/perm_income_cons.html
 
 
-## Predictive Distributions of Path Properties
+## Predictive distributions of path properties
 
 Wecker {cite}`wecker1979predicting` proposed using simulation techniques to characterize  predictive distribution of some statistics that are  non-linear functions of $y$. 
 
@@ -280,7 +280,7 @@ This is designed to express the event
 
 Following {cite}`wecker1979predicting`, we can use simulations to calculate  probabilities of $P_t$ and $N_t$ for each period $t$. 
 
-## A Wecker-Like Algorithm
+## A wecker-like algorithm
 
 The procedure consists of the following steps: 
 
@@ -297,7 +297,7 @@ $$
 * consider the sets $\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1}$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$.
 
 
-## Using Simulations to Approximate a Posterior Distribution
+## Using simulations to approximate a posterior distribution
 
 The next code cells use `pymc` to compute the time $t$ posterior distribution of $\rho, \sigma$.
 
@@ -345,7 +345,7 @@ post_samples = draw_from_posterior(initial_path)
 
 The graphs on the left portray posterior marginal distributions.
 
-## Calculating Sample Path Statistics
+## Calculating sample path statistics
 
 Our next step is to prepare Python code to compute our sample path statistics.
 
@@ -404,7 +404,7 @@ def next_turning_point(omega):
     return up_turn, down_turn
 ```
 
-## Original Wecker Method
+## Original Wecker method
 
 Now we  apply Wecker's original  method by simulating future paths and compute predictive distributions, conditioning
 on the true  parameters associated with the data-generating model.
@@ -470,7 +470,7 @@ plot_Wecker(initial_path, 1000, ax)
 plt.show()
 ```
 
-## Extended Wecker Method
+## Extended Wecker method
 
 Now we apply we apply our  "extended" Wecker method based on  predictive densities of $y$ defined by
 {eq}`ar1-tp-eq4` that acknowledge posterior uncertainty in the parameters $\rho, \sigma$.

diff --git a/lectures/back_prop.md b/lectures/back_prop.md
@@ -24,7 +24,7 @@ kernelspec:
 
 ```{code-cell} ipython3
 import jax
-## to check that gpu is activated in environment
+## To check that gpu is activated in environment
 print(f"JAX backend: {jax.devices()[0].platform}")
 ```
 
@@ -64,7 +64,7 @@ We'll describe the following concepts that are brick and mortar for neural netwo
  * back-propagation and its relationship  to the chain rule of differential calculus
 
 
-## A Deep (but not Wide) Artificial Neural Network
+## A deep (but not wide) artificial neural network
 
 We describe a  "deep" neural network of "width" one.  
 
@@ -145,7 +145,7 @@ starting from $x_1 = \tilde x$.
 The value of $x_{N+1}$ that emerges from this iterative scheme
 equals $\hat f(\tilde x)$.
 
-## Calibrating  Parameters
+## Calibrating parameters
 
 
 We now consider a  neural network like the one describe above  with width 1, depth $N$,  and activation functions $h_{i}$ for $1\leqslant i\leqslant N$ that map $\mathbb{R}$ into itself.
@@ -203,7 +203,7 @@ To implement one step of this parameter update rule, we want  the vector of deri
 
 In the neural network literature, this step is accomplished by what is known as **back propagation**.
 
-## Back Propagation and the Chain Rule
+## Back propagation and the chain rule
 
 Thanks to  properties of
 
@@ -304,7 +304,7 @@ We can then solve the above problem by applying our update for $p$ multiple time
 
 
 
-## Training Set
+## Training set
 
 Choosing a  training set amounts to a choice of measure $\mu$ in the above  formulation of our  function approximation problem as a minimization problem.
 
@@ -530,7 +530,7 @@ Image(fig.to_image(format="png"))
 # notebook locally
 ```
 
-## How Deep? 
+## How deep?
 
 It  is  fun to think about how deepening the neural net for the above example affects the quality of  approximation 
 

diff --git a/lectures/bayes_nonconj.md b/lectures/bayes_nonconj.md
@@ -83,7 +83,7 @@ from numpyro.infer import Trace_ELBO as nTrace_ELBO
 from numpyro.optim import Adam as nAdam
 ```
 
-## Unleashing MCMC on a  Binomial Likelihood
+## Unleashing MCMC on a binomial likelihood
 
 This lecture begins with the binomial example in the {doc}`quantecon lecture <prob_meaning>`.
 
@@ -103,7 +103,7 @@ We use several alternative prior distributions
 We  compare computed posteriors  with ones associated with a conjugate prior as described in  {doc}`the quantecon lecture <prob_meaning>`
 
 
-### Analytical Posterior
+### Analytical posterior
 
 Assume that the random variable $X\sim Binom\left(n,\theta\right)$.
 
@@ -183,7 +183,7 @@ def analytical_beta_posterior(data, alpha0, beta0):
     return st.beta(alpha0 + up_num, beta0 + down_num)
 ```
 
-### Two Ways to Approximate Posteriors
+### Two ways to approximate posteriors
 
 Suppose that we don't have a conjugate prior.
 
@@ -215,7 +215,7 @@ a Kullback-Leibler (KL) divergence between true posterior and the putatitive pos
 
   - minimizing the KL divergence is  equivalent with  maximizing a criterion called  the **Evidence Lower Bound** (ELBO), as we shall verify soon.
 
-## Prior Distributions
+## Prior distributions
 
 In order to be able to apply MCMC sampling or VI, `Pyro` and `Numpyro` require  that a prior distribution satisfy special properties:
 
@@ -323,7 +323,7 @@ class TruncatedvonMises(dist.Rejector):
         return constraints.interval(self.low, self.upp)
 ```
 
-### Variational Inference
+### Variational inference
 
 Instead of directly sampling from the posterior,  the **variational inference**  methodw approximates an unknown posterior distribution with  a family of tractable distributions/densities.
 
@@ -683,7 +683,7 @@ class BayesianInference:
         return params, losses
 ```
 
-## Alternative Prior Distributions
+## Alternative prior distributions
 
 Let's see how well our sampling algorithm does in approximating
 
@@ -731,7 +731,7 @@ exampleLP.show_prior(size=100000,bins=40)
 
 Having assured ourselves that our sampler seems to do a good job, let's put it to work in using MCMC to compute posterior probabilities.
 
-## Posteriors Via MCMC and VI
+## Posteriors via MCMC and VI
 
 We construct a class  `BayesianInferencePlot` to implement MCMC or VI algorithms and plot multiple posteriors for different updating data sizes and different  possible prior.
 
@@ -884,7 +884,7 @@ SVI_num_steps = 5000
 true_theta = 0.8
 ```
 
-### Beta Prior and Posteriors:
+### Beta prior and posteriors:
 
 Let's compare outcomes when we use a Beta prior.
 
@@ -953,7 +953,7 @@ will be  more accurate, as we shall see next.
 BayesianInferencePlot(true_theta, num_list, BETA_numpyro).SVI_plot(guide_dist='beta', n_steps=100000)
 ```
 
-## Non-conjugate Prior Distributions
+## Non-conjugate prior distributions
 
 Having assured ourselves that our MCMC and VI methods can work well when we have  conjugate prior and so can also compute analytically, we
 next proceed to situations in which our  prior  is not a beta distribution, so we don't have a conjugate prior.
@@ -1040,7 +1040,7 @@ To get more accuracy we will now increase the number of steps for Variational In
 SVI_num_steps = 50000
 ```
 
-#### VI with a  Truncated Normal Guide
+#### VI with a truncated normal guide
 
 ```{code-cell} ipython3
 # Uniform
@@ -1071,7 +1071,7 @@ print(f'=======INFO=======\nParameters: {example_CLASS.param}\nPrior Dist: {exam
 BayesianInferencePlot(true_theta, num_list, example_CLASS).SVI_plot(guide_dist='normal', n_steps=SVI_num_steps)
 ```
 
-#### Variational Inference with a  Beta Guide Distribution
+#### Variational inference with a Beta guide distribution
 
 ```{code-cell} ipython3
 # Uniform

diff --git a/lectures/cake_eating_numerical.md b/lectures/cake_eating_numerical.md
@@ -42,7 +42,7 @@ import numpy as np
 from scipy.optimize import minimize_scalar, bisect
 ```
 
-## Reviewing the Model
+## Reviewing the model
 
 You might like to {doc}`review the details <cake_eating_problem>` before we start.
 
@@ -66,7 +66,7 @@ to be as follows.
 
 Our first aim is to obtain these analytical solutions numerically.
 
-## Value Function Iteration
+## Value function iteration
 
 The first approach we will take is **value function iteration**.
 
@@ -86,7 +86,7 @@ The basic idea is:
 
 Let's write this a bit more mathematically.
 
-### The Bellman Operator
+### The Bellman operator
 
 We introduce the **Bellman operator** $T$ that takes a function v as an
 argument and returns a new function $Tv$ defined by
@@ -105,7 +105,7 @@ As we discuss in more detail in later lectures, one can use Banach's
 contraction mapping theorem to prove that the sequence of functions $T^n
 v$ converges to the solution to the Bellman equation.
 
-### Fitted Value Function Iteration
+### Fitted value function iteration
 
 Both consumption $c$ and the state variable $x$ are continuous.
 
@@ -338,7 +338,7 @@ less so near the lower boundary.
 The reason is that the utility function and hence value function is very
 steep near the lower boundary, and hence hard to approximate.
 
-### Policy Function
+### Policy function
 
 Let's see how this plays out in terms of computing the optimal policy.
 
@@ -419,7 +419,7 @@ possibility of faster compute time and, at the same time, more accuracy.
 
 We explore this next.
 
-## Time Iteration
+## Time iteration
 
 Now let's look at a different strategy to compute the optimal policy.