diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 4681e02d6..a8dc9e4ee 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -15,6 +15,7 @@ parts:
   numbered: true 
   chapters:
   - file: prob_matrix
+  - file: stats_examples
   - file: lln_clt
   - file: prob_meaning
   - file: multi_hyper
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 783a9a262..b53c7788f 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -11,11 +11,13 @@ kernelspec:
   name: python3
 ---
 
+
+
 # Elementary Probability with Matrices
 
 This lecture uses matrix algebra to illustrate some basic ideas about probability theory.
 
-After providing somewhat informal definitions of the underlying objects, we'll use matrices and vectors to describe probability distributions.
+After introducing  underlying objects, we'll use matrices and vectors to describe probability distributions.
 
 Among concepts that we'll be studying include
 
@@ -31,7 +33,9 @@ Among concepts that we'll be studying include
 - parameters that define a probability distribution
 - sufficient statistics as data summaries
 
-We'll use a matrix to represent a bivariate probability distribution and a vector to represent a univariate probability distribution
+We'll use a matrix to represent a bivariate or multivariate probability distribution and a vector to represent a univariate probability distribution
+
+This {doc}`companion lecture <stats_examples>` describes some popular probability distributions and uses Python to sample from them. 
 
 
 In addition to what's in Anaconda, this lecture will need the following libraries:
@@ -59,7 +63,14 @@ set_matplotlib_formats('retina')
 
 We'll briefly define what we mean by a **probability space**, a **probability measure**, and a **random variable**.
 
-For most of this lecture, we sweep these objects into the background, but they are there underlying the other objects that we'll mainly focus on.
+For most of this lecture, we sweep these objects into the background
+ 
+```{note}
+Nevertheless, they'll be lurking beneath  **induced distributions** of random variables that  we'll  focus on here. These deeper objects are essential for defining  and analysing  the concepts of stationarity and ergodicity that underly laws of large numbers.  For a relatively
+nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled "Risk, Uncertainty, and Values":<https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html>.
+``` 
+  
+
 
 Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega$ be a particular underlying outcomes.
 
@@ -87,6 +98,11 @@ where ${\mathcal G}$ is the subset of $\Omega$ for which $X(\omega) \in A$.
 
 We call this the induced probability distribution of random variable $X$.
 
+Instead of working explicitly with an underlying probability space $\Omega,\mathcal{F}$  and probability measure $\mu$,
+applied statisticians often proceed simply by specifying a form for an induced distribution for a random variable $X$. 
+
+That is how we'll proceed in this lecture and in many subsequent lectures. 
+
 
 ## What Does Probability Mean?
 
@@ -133,7 +149,7 @@ as a short-hand way of saying that the random variable $X$ is described by the p
 
 Consider drawing a  sample $x_0, x_1, \dots , x_{N-1}$ of  $N$ independent and identically distributoed  draws of $X$. 
 
-What do the "identical" and "independent" mean in   IID or iid ("identically and independently distributed)?
+What do the "identical" and "independent" mean in   IID or iid ("identically and independently distributed")?
 
 - "identical" means that each draw is from the same distribution.
 - "independent" means that  joint distribution  equal  products of marginal distributions, i.e.,
@@ -145,7 +161,7 @@ $$
 \end{aligned}
 $$
 
-We define an e **empirical distribution** as follows.
+We define an  **empirical distribution** as follows.
 
 For each $i  = 0,\dots,I-1$, let 
 
@@ -158,7 +174,7 @@ N & = \sum^{I-1}_{i=0} N_i \quad \text{total number of draws},\\
 $$
 
 
-Key ideas that  justify connecting probability theory with statistics are laws of large numbers and central limit theorems
+Key concepts that connect probability theory with statistics are laws of large numbers and central limit theorems
 
 **LLN:**
 
@@ -173,7 +189,9 @@ Key ideas that  justify connecting probability theory with statistics are laws o
 
 - For "frequentist" statisticians, **anticipated relative frequency**  is **all** that a probability distribution means.
 
-- But for a Bayesian it means something more or different.
+- But for a Bayesian it means something else -- something partly  subjective and purely personal.
+     
+     * we say "partly" because a Bayesian also pays attention to relative frequencies 
 
 
 ## Representing  Probability Distributions
@@ -195,7 +213,7 @@ $$
 F(x) = \int_{-\infty}^{x}f(t)dt
 $$
 
-Here $B$ is a set of possible $X$'s whose probability we want to compute.
+Here $B$ is a set of possible $X$'s whose probability of occurring we want to compute.
 
 When a probability density exists, a probability distribution can be characterized either by its CDF or by its  density.
 
@@ -265,10 +283,12 @@ where $\theta $ is a vector of parameters that is of much smaller dimension than
 
 **Remarks:**
 
+- A **statistical model** is a joint probability distribution characterized by a list of **parameters** 
 - The concept of  **parameter** is intimately related to the notion of  **sufficient statistic**.
--  Sufficient statistics are  nonlinear functions of a data set.
--  Sufficient statistics are designed to  summarize all  **information** about  parameters that is contained in a data set.
--  They are important tools that AI uses to summarize  a **big data** set
+- A **statistic** is a   nonlinear function of a data set.
+- **Sufficient statistics**  summarize all  **information** that a  data set contains  about  parameters of statistical model.
+   * Note that a sufficient statistic corresponds to a particular statistical model. 
+   * Sufficient statistics are key  tools that AI uses to summarize or compress  a **big data** set.
 -  R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>
 
 
@@ -406,562 +426,151 @@ $$
 =\frac{ \sum_{i}f_{ij} }{ \sum_{i}f_{ij}}=1
 $$
 
-**Remark:** The mathematics  of conditional probability  implies **Bayes' Law**:
+**Remark:** The mathematics  of conditional probability  implies:
 
 $$
 \textrm{Prob}\{X=i|Y=j\}	=\frac{\textrm{Prob}\{X=i,Y=j\}}{\textrm{Prob}\{Y=j\}}=\frac{\textrm{Prob}\{Y=j|X=i\}\textrm{Prob}\{X=i\}}{\textrm{Prob}\{Y=j\}}
 $$
 
-For the joint distribution {eq}`eq:example101discrete`
-
-$$
-\textrm{Prob}\{X=0|Y=1\} =\frac{ .1}{.1+.5}=\frac{.1}{.6}
-$$
-
-## Statistical Independence
-
-Random variables X and Y are statistically **independent** if
-
-$$
-\textrm{Prob}\{X=i,Y=j\}={f_ig_j}
-$$
-
-where
-
-$$
-\begin{aligned}
-\textrm{Prob}\{X=i\} &=f_i\ge0， \sum{f_i}=1 \cr
-\textrm{Prob}\{Y=j\} & =g_j\ge0， \sum{g_j}=1
-\end{aligned}
-$$
-
-Conditional distributions are
-
-$$
-\begin{aligned}
-\textrm{Prob}\{X=i|Y=j\} & =\frac{f_ig_j}{\sum_{i}f_ig_j}=\frac{f_ig_j}{g_j}=f_i \\
-\textrm{Prob}\{Y=j|X=i\} & =\frac{f_ig_j}{\sum_{j}f_ig_j}=\frac{f_ig_j}{f_i}=g_j
-\end{aligned}
-$$
-
-
-## Means and Variances
-
-The  mean and variance of a discrete random variable $X$  are
-
-$$
-\begin{aligned}
-\mu_{X} & \equiv\mathbb{E}\left[X\right]
-=\sum_{k}k \textrm{Prob}\{X=k\} \\
-\sigma_{X}^{2} & \equiv\mathbb{D}\left[X\right]=\sum_{k}\left(k-\mathbb{E}\left[X\right]\right)^{2}\textrm{Prob}\{X=k\}
-\end{aligned}
-$$
-
-A continuous random variable having  density $f_{X}(x)$) has  mean and variance
-
-$$
-\begin{aligned}
-\mu_{X} & \equiv\mathbb{E}\left[X\right]=\int_{-\infty}^{\infty}xf_{X}(x)dx \\
-\sigma_{X}^{2}\equiv\mathbb{D}\left[X\right] & =\mathrm{E}\left[\left(X-\mu_{X}\right)^{2}\right]=\int_{-\infty}^{\infty}\left(x-\mu_{X}\right)^{2}f_{X}(x)dx
-\end{aligned}
-$$
-
-
-## Generating Random Numbers
-
-Suppose we have at our disposal a pseudo random number that draws a uniform random variable, i.e., one with probability distribution
-
-$$
-\textrm{Prob}\{\tilde{X}=i\}=\frac{1}{I},\quad i=0,\ldots,I-1
-$$
-
-How can we transform $\tilde{X}$ to get a random variable $X$ for which $\textrm{Prob}\{X=i\}=f_i,\quad i=0,\ldots,I-1$,
- where $f_i$ is an arbitary discrete probability distribution on $i=0,1,\dots,I-1$?
-
-The key tool is the inverse of a cumulative distribution function (CDF).
-
-Observe that the CDF of a distribution is monotone and non-decreasing, taking values between $0$ and $1$.
-
-We can draw a sample of a random variable $X$ with a known CDF as follows:
-
-- draw a random variable  $u$ from a uniform distribution on $[0,1]$
-- pass the sample value of $u$ into the **"inverse"** target  CDF for $X$
-- $X$ has the target CDF
-
-
-Thus, knowing the **"inverse"** CDF of a distribution is enough to simulate from this distribution.
-
 ```{note}
-The "inverse" CDF needs to exist for this method to work.
+This can be interpreted as  a version of what a  Bayesian calls **Bayes' Law**.
 ```
 
-The inverse CDF is
-
-$$
-F^{-1}(u)\equiv\inf \{x\in \mathbb{R}: F(x) \geq u\} \quad(0<u<1)
-$$
-
-Here  we use infimum because a CDF is a non-decreasing and right-continuous function.
-
-Thus, suppose that
-
--  $U$ is a uniform random variable $U\in[0,1]$
--  We want to sample a random variable $X$ whose  CDF is  $F$.
-
-It turns out that if we use draw uniform random numbers $U$ and then compute  $X$ from
-
-$$
-X=F^{-1}(U),
-$$
-
-then $X$ is a random variable  with CDF $F_X(x)=F(x)=\textrm{Prob}\{X\le x\}$.
-
-We'll verify this in  the special case in which  $F$ is continuous and bijective so that its inverse function exists and  can be  denoted by $F^{-1}$.
-
-Note that
-
-$$
-\begin{aligned}
-F_{X}\left(x\right)	& =\textrm{Prob}\left\{ X\leq x\right\} \\
-	& =\textrm{Prob}\left\{ F^{-1}\left(U\right)\leq x\right\} \\
-	& =\textrm{Prob}\left\{ U\leq F\left(x\right)\right\} \\
-	& =F\left(x\right)
-\end{aligned}
-$$
-
-where the last equality occurs  because $U$ is distributed uniformly on $[0,1]$ while $F(x)$ is a constant given $x$ that also lies on $[0,1]$.
 
-Let's use  `numpy` to compute some examples.
 
-**Example: A continuous geometric (exponential) distribution**
-
-Let $X$ follow a geometric distribution, with parameter $\lambda>0$.
-
-Its density function is
+For the joint distribution {eq}`eq:example101discrete`
 
 $$
-\quad f(x)=\lambda e^{-\lambda x}
+\textrm{Prob}\{X=0|Y=1\} =\frac{ .1}{.1+.5}=\frac{.1}{.6}
 $$
 
-Its CDF is
-
-$$
-F(x)=\int_{0}^{\infty}\lambda e^{-\lambda x}=1-e^{-\lambda x}
-$$
 
-Let $U$ follow a uniform distribution on $[0,1]$.
+## Transition Probability Matrix
 
-$X$ is a random variable such that $U=F(X)$.
+Consider the following joint probability distribution of  two random variables.
 
-The distribution $X$ can be deduced from
+Let $X,Y$ be discrete random variables with joint distribution
 
 $$
-\begin{aligned}
-U& =F(X)=1-e^{-\lambda X}\qquad\\
-\implies & \quad -U=e^{-\lambda X}\\
-\implies&  \quad \log(1-U)=-\lambda X\\
-\implies & \quad X=\frac{(1-U)}{-\lambda}
-\end{aligned}
+\textrm{Prob}\{X=i,Y=j\} = \rho_{ij}
 $$
 
-Let's draw $u$ from $U[0,1]$ and calculate $x=\frac{log(1-U)}{-\lambda}$.
-
-
-We'll check whether  $X$  seems to follow a **continuous geometric** (exponential) distribution.
-
-Let's check with `numpy`.
-
-```{code-cell} ipython3
-n, λ = 1_000_000, 0.3
-
-# draw uniform numbers
-u = np.random.rand(n)
-
-# transform
-x = -np.log(1-u)/λ
-
-# draw geometric distributions
-x_g = np.random.exponential(1 / λ, n)
-
-# plot and compare
-plt.hist(x, bins=100, density=True)
-plt.show()
-```
-
-```{code-cell} ipython3
-plt.hist(x_g, bins=100, density=True, alpha=0.6)
-plt.show()
-```
-
-**Geometric distribution**
-
-Let $X$ distributed geometrically, that is
+where $i = 0,\dots,I-1; j = 0,\dots,J-1$ and
 
 $$
-\begin{aligned}
-\textrm{Prob}(X=i) & =(1-\lambda)\lambda^i,\quad\lambda\in(0,1), \quad  i=0,1,\dots \\
- & \sum_{i=0}^{\infty}\textrm{Prob}(X=i)=1\longleftrightarrow(1- \lambda)\sum_{i=0}^{\infty}\lambda^i=\frac{1-\lambda}{1-\lambda}=1
-\end{aligned}
+\sum_i\sum_j \rho_{ij} = 1, \quad \rho_{ij} \geqslant 0.
 $$
 
-Its CDF is given by
+An associated conditional distribution is
 
 $$
-\begin{aligned}
-\textrm{Prob}(X\le i)& =(1-\lambda)\sum_{j=0}^{i}\lambda^i\\
-& =(1-\lambda)[\frac{1-\lambda^{i+1}}{1-\lambda}]\\
-& =1-\lambda^{i+1}\\
-& =F(X)=F_i \quad
-\end{aligned}
+\textrm{Prob}\{Y=i\vert X=j\} = \frac{\rho_{ij}}{ \sum_{i}\rho_{ij}}
+= \frac{\textrm{Prob}\{Y=j, X=i\}}{\textrm{Prob}\{ X=i\}}
 $$
 
-Again, let $\tilde{U}$ follow a uniform distribution and we want to find $X$ such that $F(X)=\tilde{U}$.
-
-Let's deduce the distribution of $X$ from
+We can define a transition probability matrix
 
 $$
-\begin{aligned}
-\tilde{U} & =F(X)=1-\lambda^{x+1}\\
-1-\tilde{U} & =\lambda^{x+1}\\
-\log(1-\tilde{U})& =(x+1)\log\lambda\\
-\frac{\log(1-\tilde{U})}{\log\lambda}& =x+1\\
-\frac{\log(1-\tilde{U})}{\log\lambda}-1 &=x
-\end{aligned}
+p_{ij}=\textrm{Prob}\{Y=j|X=i\}= \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}}
 $$
 
-However, $\tilde{U}=F^{-1}(X)$ may not be an integer for any $x\geq0$.
-
-So let
+where
 
 $$
-x=\lceil\frac{\log(1-\tilde{U})}{\log\lambda}-1\rceil
+\left[
+   \begin{matrix}
+  p_{11} & p_{12}\\
+  p_{21} & p_{22}
+  \end{matrix}
+\right]
 $$
 
-where $\lceil . \rceil$ is the ceiling function.
-
-Thus $x$ is the smallest integer such that the discrete geometric CDF is greater than or equal to $\tilde{U}$.
-
-We can verify that $x$ is indeed geometrically distributed by the following `numpy` program.
-
-```{note}
-The exponential distribution is the continuous analog of geometric distribution.
-```
-
-```{code-cell} ipython3
-n, λ = 1_000_000, 0.8
-
-# draw uniform numbers
-u = np.random.rand(n)
-
-# transform
-x = np.ceil(np.log(1-u)/np.log(λ) - 1)
-
-# draw geometric distributions
-x_g = np.random.geometric(1-λ, n)
-
-# plot and compare
-plt.hist(x, bins=150, density=True)
-plt.show()
-```
-
-```{code-cell} ipython3
-np.random.geometric(1-λ, n).max()
-```
-
-```{code-cell} ipython3
-np.log(0.4)/np.log(0.3)
-```
+The first row is the probability that $Y=j, j=0,1$ conditional on $X=0$.
 
-```{code-cell} ipython3
-plt.hist(x_g, bins=150, density=True, alpha=0.6)
-plt.show()
-```
+The second row is the probability that $Y=j, j=0,1$ conditional on $X=1$.
 
-## Some Discrete Probability Distributions
+Note that
+- $\sum_{j}\rho_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of $\rho$ is a probability distribution (not so for each column).
 
 
-Let's write some Python code to compute   means and variances of some  univariate random variables.
 
-We'll use our code to
+## Application: Forecasting a Time Series
 
-- compute population means and variances from the probability distribution
-- generate  a sample  of $N$ independently and identically distributed draws and compute sample means and variances
-- compare population and sample means and variances
+Suppose that there are two time periods.
 
-## Geometric distribution
+- $t=0$  "today"
+- $t=1$  "tomorrow"
 
-$$
-\textrm{Prob}(X=k)=(1-p)^{k-1}p,k=1,2, \ldots
-$$
+Let $X(0)$ be a random variable to be realized at $t=0$, $X(1)$  be a random variable to be realized at $t=1$.
 
-$\implies$
+Suppose that
 
 $$
 \begin{aligned}
-\mathbb{E}(X) & =\frac{1}{p}\\\mathbb{D}(X) & =\frac{1-p}{p^2}
+\text{Prob} \{X(0)=i,X(1)=j\} &=f_{ij}≥0，i=0,\cdots,I-1\\
+\sum_{i}\sum_{j}f_{ij}&=1
 \end{aligned}
 $$
 
-We draw observations from the distribution and compare the sample mean and variance with the theoretical results.
-
-```{code-cell} ipython3
-# specify parameters
-p, n = 0.3, 1_000_000
-
-# draw observations from the distribution
-x = np.random.geometric(p, n)
-
-# compute sample mean and variance
-μ_hat = np.mean(x)
-σ2_hat = np.var(x)
-
-print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
-
-# compare with theoretical results
-print("\nThe population mean is: ", 1/p)
-print("The population variance is: ", (1-p)/(p**2))
-```
-
-### Newcomb–Benford distribution
-
-The **Newcomb–Benford law** fits  many data sets, e.g., reports of incomes to tax authorities, in which
-the leading digit is more likely to be small than large.
-
-See <https://en.wikipedia.org/wiki/Benford%27s_law>
+$f_{ij} $ is a joint distribution over $[X(0), X(1)]$.
 
-A Benford probability distribution is
+A conditional distribution is
 
-$$
-\textrm{Prob}\{X=d\}=\log _{10}(d+1)-\log _{10}(d)=\log _{10}\left(1+\frac{1}{d}\right)
-$$
+$$\text{Prob} \{X(1)=j|X(0)=i\}= \frac{f_{ij}}{ \sum_{j}f_{ij}}$$
 
-where $d\in\{1,2,\cdots,9\}$ can be thought of as a **first digit** in a sequence of digits.
+**Remark:**
+- This formula is a workhorse for applied economic forecasters.
 
-This is a well defined discrete distribution since we can verify that probabilities are nonnegative and sum to $1$.
 
-$$
-\log_{10}\left(1+\frac{1}{d}\right)\geq0,\quad\sum_{d=1}^{9}\log_{10}\left(1+\frac{1}{d}\right)=1
-$$
+## Statistical Independence
 
-The mean and variance of a Benford distribution are
+Random variables X and Y are statistically **independent** if
 
 $$
-\begin{aligned}
-\mathbb{E}\left[X\right]	 &=\sum_{d=1}^{9}d\log_{10}\left(1+\frac{1}{d}\right)\simeq3.4402 \\
-\mathbb{V}\left[X\right]	 & =\sum_{d=1}^{9}\left(d-\mathbb{E}\left[X\right]\right)^{2}\log_{10}\left(1+\frac{1}{d}\right)\simeq6.0565
-\end{aligned}
+\textrm{Prob}\{X=i,Y=j\}={f_ig_j}
 $$
 
-We verify the above and compute the mean and variance using `numpy`.
-
-```{code-cell} ipython3
-Benford_pmf = np.array([np.log10(1+1/d) for d in range(1,10)])
-k = np.array(range(1,10))
-
-# mean
-mean = np.sum(Benford_pmf * k)
-
-# variance
-var = np.sum([(k-mean)**2 * Benford_pmf])
-
-# verify sum to 1
-print(np.sum(Benford_pmf))
-print(mean)
-print(var)
-```
-
-```{code-cell} ipython3
-# plot distribution
-plt.plot(range(1,10), Benford_pmf, 'o')
-plt.title('Benford\'s distribution')
-plt.show()
-```
-
-### Pascal (negative binomial) distribution
-
-Consider a sequence of independent Bernoulli trials.
-
-Let $p$ be the probability of success.
-
-Let $X$ be a random variable that represents the number of failures before we get $r$ success.
-
-Its distribution is
+where
 
 $$
 \begin{aligned}
-X  & \sim NB(r,p) \\
-\textrm{Prob}(X=k;r,p) & = \begin{pmatrix}k+r-1 \\ r-1 \end{pmatrix}p^r(1-p)^{k}
+\textrm{Prob}\{X=i\} &=f_i\ge0， \sum{f_i}=1 \cr
+\textrm{Prob}\{Y=j\} & =g_j\ge0， \sum{g_j}=1
 \end{aligned}
 $$
 
-Here, we choose from among $k+r-1$ possible outcomes  because the last draw is by definition a success.
-
-We compute the mean and variance to be
-
+Conditional distributions are
 
 $$
 \begin{aligned}
-\mathbb{E}(X) & = \frac{k(1-p)}{p} \\
-\mathbb{V}(X) & = \frac{k(1-p)}{p^2}
+\textrm{Prob}\{X=i|Y=j\} & =\frac{f_ig_j}{\sum_{i}f_ig_j}=\frac{f_ig_j}{g_j}=f_i \\
+\textrm{Prob}\{Y=j|X=i\} & =\frac{f_ig_j}{\sum_{j}f_ig_j}=\frac{f_ig_j}{f_i}=g_j
 \end{aligned}
 $$
 
-```{code-cell} ipython3
-# specify parameters
-r, p, n = 10, 0.3, 1_000_000
-
-# draw observations from the distribution
-x = np.random.negative_binomial(r, p, n)
-
-# compute sample mean and variance
-μ_hat = np.mean(x)
-σ2_hat = np.var(x)
-
-print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
-print("\nThe population mean is: ", r*(1-p)/p)
-print("The population variance is: ", r*(1-p)/p**2)
-```
-
-## Continuous Random Variables
-
-### Univariate Gaussian distribution
 
-We write
-
-$$
-X \sim N(\mu,\sigma^2)
-$$
-
-to indicate the probability distribution
-
-$$f(x|u,\sigma^2)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{[-\frac{1}{2\sigma^2}(x-u)^2]} $$
-
-In the below example, we set $\mu = 0, \sigma = 0.1$.
-
-```{code-cell} ipython3
-# specify parameters
-μ, σ = 0, 0.1
-
-# specify number of draws
-n = 1_000_000
-
-# draw observations from the distribution
-x = np.random.normal(μ, σ, n)
-
-# compute sample mean and variance
-μ_hat = np.mean(x)
-σ_hat = np.std(x)
-
-print("The sample mean is: ", μ_hat)
-print("The sample standard deviation is: ", σ_hat)
-```
-
-```{code-cell} ipython3
-# compare
-print(μ-μ_hat < 1e-3)
-print(σ-σ_hat < 1e-3)
-```
-
-### Uniform Distribution
-
-$$
-\begin{aligned}
-X & \sim U[a,b] \\
-f(x)& = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ \quad0, & \text{otherwise}  \end{cases}
-\end{aligned}
-$$
+## Means and Variances
 
-The population mean and variance are
+The  mean and variance of a discrete random variable $X$  are
 
 $$
 \begin{aligned}
-\mathbb{E}(X) & = \frac{a+b}{2} \\
-\mathbb{V}(X) & = \frac{(b-a)^2}{12}
+\mu_{X} & \equiv\mathbb{E}\left[X\right]
+=\sum_{k}k \textrm{Prob}\{X=k\} \\
+\sigma_{X}^{2} & \equiv\mathbb{D}\left[X\right]=\sum_{k}\left(k-\mathbb{E}\left[X\right]\right)^{2}\textrm{Prob}\{X=k\}
 \end{aligned}
 $$
 
-```{code-cell} ipython3
-# specify parameters
-a, b = 10, 20
-
-# specify number of draws
-n = 1_000_000
-
-# draw observations from the distribution
-x = a + (b-a)*np.random.rand(n)
-
-# compute sample mean and variance
-μ_hat = np.mean(x)
-σ2_hat = np.var(x)
-
-print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
-print("\nThe population mean is: ", (a+b)/2)
-print("The population variance is: ", (b-a)**2/12)
-```
-
-##  A Mixed Discrete-Continuous Distribution
-
-We'll motivate this example with  a little story.
-
-
-Suppose that  to apply for a job  you take an interview and either pass or fail it.
-
-You have $5\%$ chance to pass an interview and you know your salary will uniformly distributed in the interval 300~400 a day only if you pass.
-
-We can describe your daily salary as  a discrete-continuous variable with the following probabilities:
-
-$$
-P(X=0)=0.95
-$$
-
-$$
-P(300\le X \le 400)=\int_{300}^{400} f(x)\, dx=0.05
-$$
-
-$$
-f(x) = 0.0005
-$$
-
-Let's start by generating a random sample and computing sample moments.
-
-```{code-cell} ipython3
-x = np.random.rand(1_000_000)
-# x[x > 0.95] = 100*x[x > 0.95]+300
-x[x > 0.95] = 100*np.random.rand(len(x[x > 0.95]))+300
-x[x <= 0.95] = 0
-
-μ_hat = np.mean(x)
-σ2_hat = np.var(x)
-
-print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
-```
-
-The analytical mean and variance can be computed:
-
-$$
-\begin{aligned}
-\mu &= \int_{300}^{400}xf(x)dx \\
-&= 0.0005\int_{300}^{400}xdx \\
-&= 0.0005 \times \frac{1}{2}x^2\bigg|_{300}^{400}
-\end{aligned}
-$$
+A continuous random variable having  density $f_{X}(x)$) has  mean and variance
 
 $$
 \begin{aligned}
-\sigma^2 &= 0.95\times(0-17.5)^2+\int_{300}^{400}(x-17.5)^2f(x)dx \\
-&= 0.95\times17.5^2+0.0005\int_{300}^{400}(x-17.5)^2dx \\
-&= 0.95\times17.5^2+0.0005 \times \frac{1}{3}(x-17.5)^3 \bigg|_{300}^{400}
+\mu_{X} & \equiv\mathbb{E}\left[X\right]=\int_{-\infty}^{\infty}xf_{X}(x)dx \\
+\sigma_{X}^{2}\equiv\mathbb{D}\left[X\right] & =\mathrm{E}\left[\left(X-\mu_{X}\right)^{2}\right]=\int_{-\infty}^{\infty}\left(x-\mu_{X}\right)^{2}f_{X}(x)dx
 \end{aligned}
 $$
 
-```{code-cell} ipython3
-mean = 0.0005*0.5*(400**2 - 300**2)
-var = 0.95*17.5**2+0.0005/3*((400-17.5)**3-(300-17.5)**3)
-print("mean: ", mean)
-print("variance: ", var)
-```
-
-## Matrix Representation of Some Bivariate Distributions
+## Matrix Representations of Some Bivariate Distributions
 
 Let's use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and variance of a  bivariate random variable.
 
@@ -1497,53 +1106,6 @@ $$
 
 where $ f_{X}*g_{Y} $ denotes the **convolution** of the $f_X$ and $g_Y$ functions.
 
-## Transition Probability Matrix
-
-Consider the following joint probability distribution of  two random variables.
-
-Let $X,Y$ be discrete random variables with joint distribution
-
-$$
-\textrm{Prob}\{X=i,Y=j\} = \rho_{ij}
-$$
-
-where $i = 0,\dots,I-1; j = 0,\dots,J-1$ and
-
-$$
-\sum_i\sum_j \rho_{ij} = 1, \quad \rho_{ij} \geqslant 0.
-$$
-
-An associated conditional distribution is
-
-$$
-\textrm{Prob}\{Y=i\vert X=j\} = \frac{\rho_{ij}}{ \sum_{i}\rho_{ij}}
-= \frac{\textrm{Prob}\{Y=j, X=i\}}{\textrm{Prob}\{ X=i\}}
-$$
-
-We can define a transition probability matrix
-
-$$
-p_{ij}=\textrm{Prob}\{Y=j|X=i\}= \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}}
-$$
-
-where
-
-$$
-\left[
-   \begin{matrix}
-  p_{11} & p_{12}\\
-  p_{21} & p_{22}
-  \end{matrix}
-\right]
-$$
-
-The first row is the probability of $Y=j, j=0,1$ conditional on $X=0$.
-
-The second row is the probability of $Y=j, j=0,1$ conditional on $X=1$.
-
-Note that
-- $\sum_{j}\rho_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of $\rho$ is a probability distribution (not so for each column.
-
 ## Coupling
 
 Start with a joint distribution
@@ -1912,30 +1474,3 @@ print(c2_ymtb)
 We have verified that both joint distributions, $c_1$ and $c_2$, have identical marginal distributions of $X$ and $Y$, respectively.
 
 So they are both couplings of $X$ and $Y$.
-
-## Time Series
-
-Suppose that there are two time periods.
-
-- $t=0$  "today"
-- $t=1$  "tomorrow"
-
-Let $X(0)$ be a random variable to be realized at $t=0$, $X(1)$  be a random variable to be realized at $t=1$.
-
-Suppose that
-
-$$
-\begin{aligned}
-\text{Prob} \{X(0)=i,X(1)=j\} &=f_{ij}≥0，i=0,\cdots,I-1\\
-\sum_{i}\sum_{j}f_{ij}&=1
-\end{aligned}
-$$
-
-$f_{ij} $ is a joint distribution over $[X(0), X(1)]$.
-
-A conditional distribution is
-
-$$\text{Prob} \{X(1)=j|X(0)=i\}= \frac{f_{ij}}{ \sum_{j}f_{ij}}$$
-
-**Remark:**
-- This is a key formula for  a theory of optimally predicting a time series.
diff --git a/lectures/stats_examples.md b/lectures/stats_examples.md
new file mode 100644
index 000000000..922538628
--- /dev/null
+++ b/lectures/stats_examples.md
@@ -0,0 +1,540 @@
+---
+jupytext:
+  text_representation:
+    extension: .myst
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.13.8
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+# Some Probability Distributions
+
+This lecture is a supplement to {doc}`this lecture on statistics with matrices <prob_matrix>`.
+
+It describes some popular distributions and uses Python to sample from them. 
+
+It also describes a way to sample from an arbitrary probability distribution that you make up by 
+transforming a sample from a uniform probability distribution. 
+
+
+In addition to what's in Anaconda, this lecture will need the following libraries:
+
+```{code-cell} ipython3
+---
+tags: [hide-output]
+---
+!pip install prettytable
+```
+
+As usual, we'll start with some imports
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+import prettytable as pt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib_inline.backend_inline import set_matplotlib_formats
+set_matplotlib_formats('retina')
+```
+
+
+## Some Discrete Probability Distributions
+
+
+Let's write some Python code to compute   means and variances of some  univariate random variables.
+
+We'll use our code to
+
+- compute population means and variances from the probability distribution
+- generate  a sample  of $N$ independently and identically distributed draws and compute sample means and variances
+- compare population and sample means and variances
+
+## Geometric distribution
+
+$$
+\textrm{Prob}(X=k)=(1-p)^{k-1}p,k=1,2, \ldots
+$$
+
+$\implies$
+
+$$
+\begin{aligned}
+\mathbb{E}(X) & =\frac{1}{p}\\\mathbb{D}(X) & =\frac{1-p}{p^2}
+\end{aligned}
+$$
+
+We draw observations from the distribution and compare the sample mean and variance with the theoretical results.
+
+```{code-cell} ipython3
+# specify parameters
+p, n = 0.3, 1_000_000
+
+# draw observations from the distribution
+x = np.random.geometric(p, n)
+
+# compute sample mean and variance
+μ_hat = np.mean(x)
+σ2_hat = np.var(x)
+
+print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
+
+# compare with theoretical results
+print("\nThe population mean is: ", 1/p)
+print("The population variance is: ", (1-p)/(p**2))
+```
+
+
+## Pascal (negative binomial) distribution
+
+Consider a sequence of independent Bernoulli trials.
+
+Let $p$ be the probability of success.
+
+Let $X$ be a random variable that represents the number of failures before we get $r$ success.
+
+Its distribution is
+
+$$
+\begin{aligned}
+X  & \sim NB(r,p) \\
+\textrm{Prob}(X=k;r,p) & = \begin{pmatrix}k+r-1 \\ r-1 \end{pmatrix}p^r(1-p)^{k}
+\end{aligned}
+$$
+
+Here, we choose from among $k+r-1$ possible outcomes  because the last draw is by definition a success.
+
+We compute the mean and variance to be
+
+
+$$
+\begin{aligned}
+\mathbb{E}(X) & = \frac{k(1-p)}{p} \\
+\mathbb{V}(X) & = \frac{k(1-p)}{p^2}
+\end{aligned}
+$$
+
+```{code-cell} ipython3
+# specify parameters
+r, p, n = 10, 0.3, 1_000_000
+
+# draw observations from the distribution
+x = np.random.negative_binomial(r, p, n)
+
+# compute sample mean and variance
+μ_hat = np.mean(x)
+σ2_hat = np.var(x)
+
+print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
+print("\nThe population mean is: ", r*(1-p)/p)
+print("The population variance is: ", r*(1-p)/p**2)
+```
+
+
+## Newcomb–Benford distribution
+
+The **Newcomb–Benford law** fits  many data sets, e.g., reports of incomes to tax authorities, in which
+the leading digit is more likely to be small than large.
+
+See <https://en.wikipedia.org/wiki/Benford%27s_law>
+
+A Benford probability distribution is
+
+$$
+\textrm{Prob}\{X=d\}=\log _{10}(d+1)-\log _{10}(d)=\log _{10}\left(1+\frac{1}{d}\right)
+$$
+
+where $d\in\{1,2,\cdots,9\}$ can be thought of as a **first digit** in a sequence of digits.
+
+This is a well defined discrete distribution since we can verify that probabilities are nonnegative and sum to $1$.
+
+$$
+\log_{10}\left(1+\frac{1}{d}\right)\geq0,\quad\sum_{d=1}^{9}\log_{10}\left(1+\frac{1}{d}\right)=1
+$$
+
+The mean and variance of a Benford distribution are
+
+$$
+\begin{aligned}
+\mathbb{E}\left[X\right]	 &=\sum_{d=1}^{9}d\log_{10}\left(1+\frac{1}{d}\right)\simeq3.4402 \\
+\mathbb{V}\left[X\right]	 & =\sum_{d=1}^{9}\left(d-\mathbb{E}\left[X\right]\right)^{2}\log_{10}\left(1+\frac{1}{d}\right)\simeq6.0565
+\end{aligned}
+$$
+
+We verify the above and compute the mean and variance using `numpy`.
+
+```{code-cell} ipython3
+Benford_pmf = np.array([np.log10(1+1/d) for d in range(1,10)])
+k = np.array(range(1,10))
+
+# mean
+mean = np.sum(Benford_pmf * k)
+
+# variance
+var = np.sum([(k-mean)**2 * Benford_pmf])
+
+# verify sum to 1
+print(np.sum(Benford_pmf))
+print(mean)
+print(var)
+```
+
+```{code-cell} ipython3
+# plot distribution
+plt.plot(range(1,10), Benford_pmf, 'o')
+plt.title('Benford\'s distribution')
+plt.show()
+```
+
+Now let's turn to some continuous random variables. 
+
+## Univariate Gaussian distribution
+
+We write
+
+$$
+X \sim N(\mu,\sigma^2)
+$$
+
+to indicate the probability distribution
+
+$$f(x|u,\sigma^2)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{[-\frac{1}{2\sigma^2}(x-u)^2]} $$
+
+In the below example, we set $\mu = 0, \sigma = 0.1$.
+
+```{code-cell} ipython3
+# specify parameters
+μ, σ = 0, 0.1
+
+# specify number of draws
+n = 1_000_000
+
+# draw observations from the distribution
+x = np.random.normal(μ, σ, n)
+
+# compute sample mean and variance
+μ_hat = np.mean(x)
+σ_hat = np.std(x)
+
+print("The sample mean is: ", μ_hat)
+print("The sample standard deviation is: ", σ_hat)
+```
+
+```{code-cell} ipython3
+# compare
+print(μ-μ_hat < 1e-3)
+print(σ-σ_hat < 1e-3)
+```
+
+## Uniform Distribution
+
+$$
+\begin{aligned}
+X & \sim U[a,b] \\
+f(x)& = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ \quad0, & \text{otherwise}  \end{cases}
+\end{aligned}
+$$
+
+The population mean and variance are
+
+$$
+\begin{aligned}
+\mathbb{E}(X) & = \frac{a+b}{2} \\
+\mathbb{V}(X) & = \frac{(b-a)^2}{12}
+\end{aligned}
+$$
+
+```{code-cell} ipython3
+# specify parameters
+a, b = 10, 20
+
+# specify number of draws
+n = 1_000_000
+
+# draw observations from the distribution
+x = a + (b-a)*np.random.rand(n)
+
+# compute sample mean and variance
+μ_hat = np.mean(x)
+σ2_hat = np.var(x)
+
+print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
+print("\nThe population mean is: ", (a+b)/2)
+print("The population variance is: ", (b-a)**2/12)
+```
+
+##  A Mixed Discrete-Continuous Distribution
+
+We'll motivate this example with  a little story.
+
+
+Suppose that  to apply for a job  you take an interview and either pass or fail it.
+
+You have $5\%$ chance to pass an interview and you know your salary will uniformly distributed in the interval 300~400 a day only if you pass.
+
+We can describe your daily salary as  a discrete-continuous variable with the following probabilities:
+
+$$
+P(X=0)=0.95
+$$
+
+$$
+P(300\le X \le 400)=\int_{300}^{400} f(x)\, dx=0.05
+$$
+
+$$
+f(x) = 0.0005
+$$
+
+Let's start by generating a random sample and computing sample moments.
+
+```{code-cell} ipython3
+x = np.random.rand(1_000_000)
+# x[x > 0.95] = 100*x[x > 0.95]+300
+x[x > 0.95] = 100*np.random.rand(len(x[x > 0.95]))+300
+x[x <= 0.95] = 0
+
+μ_hat = np.mean(x)
+σ2_hat = np.var(x)
+
+print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
+```
+
+The analytical mean and variance can be computed:
+
+$$
+\begin{aligned}
+\mu &= \int_{300}^{400}xf(x)dx \\
+&= 0.0005\int_{300}^{400}xdx \\
+&= 0.0005 \times \frac{1}{2}x^2\bigg|_{300}^{400}
+\end{aligned}
+$$
+
+$$
+\begin{aligned}
+\sigma^2 &= 0.95\times(0-17.5)^2+\int_{300}^{400}(x-17.5)^2f(x)dx \\
+&= 0.95\times17.5^2+0.0005\int_{300}^{400}(x-17.5)^2dx \\
+&= 0.95\times17.5^2+0.0005 \times \frac{1}{3}(x-17.5)^3 \bigg|_{300}^{400}
+\end{aligned}
+$$
+
+```{code-cell} ipython3
+mean = 0.0005*0.5*(400**2 - 300**2)
+var = 0.95*17.5**2+0.0005/3*((400-17.5)**3-(300-17.5)**3)
+print("mean: ", mean)
+print("variance: ", var)
+```
+
+
+## Drawing a  Random Number from a Particular Distribution
+
+Suppose we have at our disposal a pseudo random number that draws a uniform random variable, i.e., one with probability distribution
+
+$$
+\textrm{Prob}\{\tilde{X}=i\}=\frac{1}{I},\quad i=0,\ldots,I-1
+$$
+
+How can we transform $\tilde{X}$ to get a random variable $X$ for which $\textrm{Prob}\{X=i\}=f_i,\quad i=0,\ldots,I-1$,
+ where $f_i$ is an arbitary discrete probability distribution on $i=0,1,\dots,I-1$?
+
+The key tool is the inverse of a cumulative distribution function (CDF).
+
+Observe that the CDF of a distribution is monotone and non-decreasing, taking values between $0$ and $1$.
+
+We can draw a sample of a random variable $X$ with a known CDF as follows:
+
+- draw a random variable  $u$ from a uniform distribution on $[0,1]$
+- pass the sample value of $u$ into the **"inverse"** target  CDF for $X$
+- $X$ has the target CDF
+
+
+Thus, knowing the **"inverse"** CDF of a distribution is enough to simulate from this distribution.
+
+```{note}
+The "inverse" CDF needs to exist for this method to work.
+```
+
+The inverse CDF is
+
+$$
+F^{-1}(u)\equiv\inf \{x\in \mathbb{R}: F(x) \geq u\} \quad(0<u<1)
+$$
+
+Here  we use infimum because a CDF is a non-decreasing and right-continuous function.
+
+Thus, suppose that
+
+-  $U$ is a uniform random variable $U\in[0,1]$
+-  We want to sample a random variable $X$ whose  CDF is  $F$.
+
+It turns out that if we use draw uniform random numbers $U$ and then compute  $X$ from
+
+$$
+X=F^{-1}(U),
+$$
+
+then $X$ is a random variable  with CDF $F_X(x)=F(x)=\textrm{Prob}\{X\le x\}$.
+
+We'll verify this in  the special case in which  $F$ is continuous and bijective so that its inverse function exists and  can be  denoted by $F^{-1}$.
+
+Note that
+
+$$
+\begin{aligned}
+F_{X}\left(x\right)	& =\textrm{Prob}\left\{ X\leq x\right\} \\
+	& =\textrm{Prob}\left\{ F^{-1}\left(U\right)\leq x\right\} \\
+	& =\textrm{Prob}\left\{ U\leq F\left(x\right)\right\} \\
+	& =F\left(x\right)
+\end{aligned}
+$$
+
+where the last equality occurs  because $U$ is distributed uniformly on $[0,1]$ while $F(x)$ is a constant given $x$ that also lies on $[0,1]$.
+
+Let's use  `numpy` to compute some examples.
+
+**Example: A continuous geometric (exponential) distribution**
+
+Let $X$ follow a geometric distribution, with parameter $\lambda>0$.
+
+Its density function is
+
+$$
+\quad f(x)=\lambda e^{-\lambda x}
+$$
+
+Its CDF is
+
+$$
+F(x)=\int_{0}^{\infty}\lambda e^{-\lambda x}=1-e^{-\lambda x}
+$$
+
+Let $U$ follow a uniform distribution on $[0,1]$.
+
+$X$ is a random variable such that $U=F(X)$.
+
+The distribution $X$ can be deduced from
+
+$$
+\begin{aligned}
+U& =F(X)=1-e^{-\lambda X}\qquad\\
+\implies & \quad -U=e^{-\lambda X}\\
+\implies&  \quad \log(1-U)=-\lambda X\\
+\implies & \quad X=\frac{(1-U)}{-\lambda}
+\end{aligned}
+$$
+
+Let's draw $u$ from $U[0,1]$ and calculate $x=\frac{log(1-U)}{-\lambda}$.
+
+
+We'll check whether  $X$  seems to follow a **continuous geometric** (exponential) distribution.
+
+Let's check with `numpy`.
+
+```{code-cell} ipython3
+n, λ = 1_000_000, 0.3
+
+# draw uniform numbers
+u = np.random.rand(n)
+
+# transform
+x = -np.log(1-u)/λ
+
+# draw geometric distributions
+x_g = np.random.exponential(1 / λ, n)
+
+# plot and compare
+plt.hist(x, bins=100, density=True)
+plt.show()
+```
+
+```{code-cell} ipython3
+plt.hist(x_g, bins=100, density=True, alpha=0.6)
+plt.show()
+```
+
+**Geometric distribution**
+
+Let $X$ distributed geometrically, that is
+
+$$
+\begin{aligned}
+\textrm{Prob}(X=i) & =(1-\lambda)\lambda^i,\quad\lambda\in(0,1), \quad  i=0,1,\dots \\
+ & \sum_{i=0}^{\infty}\textrm{Prob}(X=i)=1\longleftrightarrow(1- \lambda)\sum_{i=0}^{\infty}\lambda^i=\frac{1-\lambda}{1-\lambda}=1
+\end{aligned}
+$$
+
+Its CDF is given by
+
+$$
+\begin{aligned}
+\textrm{Prob}(X\le i)& =(1-\lambda)\sum_{j=0}^{i}\lambda^i\\
+& =(1-\lambda)[\frac{1-\lambda^{i+1}}{1-\lambda}]\\
+& =1-\lambda^{i+1}\\
+& =F(X)=F_i \quad
+\end{aligned}
+$$
+
+Again, let $\tilde{U}$ follow a uniform distribution and we want to find $X$ such that $F(X)=\tilde{U}$.
+
+Let's deduce the distribution of $X$ from
+
+$$
+\begin{aligned}
+\tilde{U} & =F(X)=1-\lambda^{x+1}\\
+1-\tilde{U} & =\lambda^{x+1}\\
+\log(1-\tilde{U})& =(x+1)\log\lambda\\
+\frac{\log(1-\tilde{U})}{\log\lambda}& =x+1\\
+\frac{\log(1-\tilde{U})}{\log\lambda}-1 &=x
+\end{aligned}
+$$
+
+However, $\tilde{U}=F^{-1}(X)$ may not be an integer for any $x\geq0$.
+
+So let
+
+$$
+x=\lceil\frac{\log(1-\tilde{U})}{\log\lambda}-1\rceil
+$$
+
+where $\lceil . \rceil$ is the ceiling function.
+
+Thus $x$ is the smallest integer such that the discrete geometric CDF is greater than or equal to $\tilde{U}$.
+
+We can verify that $x$ is indeed geometrically distributed by the following `numpy` program.
+
+```{note}
+The exponential distribution is the continuous analog of geometric distribution.
+```
+
+```{code-cell} ipython3
+n, λ = 1_000_000, 0.8
+
+# draw uniform numbers
+u = np.random.rand(n)
+
+# transform
+x = np.ceil(np.log(1-u)/np.log(λ) - 1)
+
+# draw geometric distributions
+x_g = np.random.geometric(1-λ, n)
+
+# plot and compare
+plt.hist(x, bins=150, density=True)
+plt.show()
+```
+
+```{code-cell} ipython3
+np.random.geometric(1-λ, n).max()
+```
+
+```{code-cell} ipython3
+np.log(0.4)/np.log(0.3)
+```
+
+```{code-cell} ipython3
+plt.hist(x_g, bins=150, density=True, alpha=0.6)
+plt.show()
+```