Skip to content

Commit 13e5313

Browse files
committed
fixed math formatting in MCMC submodule
1 parent be0fbb5 commit 13e5313

File tree

6 files changed

+1886
-26
lines changed

6 files changed

+1886
-26
lines changed

.DS_Store

0 Bytes
Binary file not shown.

06-the-learnable-universe/module-1-statistical-inference/01-bayesian-statistics-inference/03-mod5-part3-MCMC-HMC.md renamed to 06-the-learnable-universe/module-1-statistical-inference/01-bayesian-statistics-inference/03-mod5-part3-MCMC.md

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -580,14 +580,14 @@ $$
580580
:class: tip
581581
**Before we derive the acceptance probability, take a moment to think:**
582582

583-
Given that we want detailed balance π(θ)T(θ'|θ) = π(θ')T(θ|θ'), and we've decided T = Q × α, what constraints must α satisfy?
583+
Given that we want detailed balance $π(θ)T(θ'|θ) = π(θ')T(θ|θ')$, and we've decided $T = Q × α$, what constraints must $α$ satisfy?
584584

585585
Write down your answer before reading on:
586586

587-
- Should α depend on both θ and θ', or just one?
588-
- If θ' has higher posterior probability than θ, should we always accept?
589-
- If θ' has lower posterior probability, should we always reject?
590-
- What role does the proposal distribution Q play?
587+
- Should $α$ depend on both $θ$ and $θ'$, or just one?
588+
- If $θ'$ has higher posterior probability than θ, should we always accept?
589+
- If $θ'$ has lower posterior probability, should we always reject?
590+
- What role does the proposal distribution $Q$ play?
591591

592592
Think about it for 30 seconds, then continue. The derivation will be more meaningful if you've wrestled with the problem first.
593593
:::
@@ -618,15 +618,17 @@ $$
618618
r = \frac{\pi(\theta') Q(\theta | \theta')}{\pi(\theta) Q(\theta' | \theta)}
619619
$$
620620

621-
**When \(\pi\) is the posterior:** write \(\pi(\theta) \propto p(D\mid\theta)p(\theta)\). Then
621+
**When $\pi$ is the posterior:** write
622+
$$\pi(\theta) \propto p(D\mid\theta)p(\theta).$$
622623

623-
\[
624+
Then
625+
626+
$$
624627
r \;=\; \frac{\pi(\theta')\,Q(\theta\mid\theta')}{\pi(\theta)\,Q(\theta'\mid\theta)}
625628
\;=\; \frac{p(D\mid\theta')\,p(\theta')}{p(D\mid\theta)\,p(\theta)} \cdot \frac{Q(\theta\mid\theta')}{Q(\theta'\mid\theta)},
626-
\]
627-
628-
so the evidence \(p(D)\) cancels because it is constant in \(\theta\).
629+
$$
629630

631+
so the evidence $p(D)$ cancels because it is constant in $\theta$.
630632

631633
We need $α(θ'|θ)$ and $α(θ|θ')$ to have ratio $r$. A simple choice that works:
632634

@@ -664,7 +666,7 @@ This is an example of the "Barker acceptance" vs. "Metropolis acceptance" distin
664666

665667
### Proposal Distributions: Symmetric vs. Asymmetric
666668

667-
The acceptance probability depends on the proposal distribution Q. Two important cases:
669+
The acceptance probability depends on the proposal distribution $Q$. Two important cases:
668670

669671
**Symmetric proposals**: $Q(θ'|θ) = Q(θ|θ')$
670672

@@ -680,12 +682,12 @@ $$
680682
\theta' \sim \mathcal{N}(\theta;\,\Sigma)\quad\Longleftrightarrow\quad \theta'=\theta+\varepsilon,\;\varepsilon\sim\mathcal{N}(0,\Sigma).
681683
$$
682684

683-
For symmetric \(Q\), the \(Q\)-terms cancel and
685+
For symmetric $Q$, the $Q$-terms cancel and
684686
$$
685687
\alpha=\min\!\left(1,\frac{\pi(\theta')}{\pi(\theta)}\right).
686688
$$
687689

688-
Here you can evaluate $π$ using only log-likelihood + log-prior; any constant normalizer cancels. This is the **Metropolis algorithm** (the original 1953 version). You only need to evaluate the ratio of posterior probabilities!
690+
Here you can evaluate $π$ using only log-likelihood + log-prior; any constant normalizer cancels. This is the **Metropolis algorithm** (the original 1953 version). **You only need to evaluate the ratio of posterior probabilities!**
689691

690692
**Asymmetric proposals**: $Q(θ'|θ) ≠ Q(θ|θ')$
691693

@@ -718,7 +720,7 @@ The proposal distribution Q determines how the chain explores. A crucial paramet
718720
:class: dropdown
719721
For high-dimensional problems with Gaussian targets and Gaussian proposals, there's a beautiful theory (Roberts & Rosenthal 2001):
720722

721-
**Optimal acceptance rate**: ~23.4% as dimension d → ∞
723+
**Optimal acceptance rate**: ~23.4% as dimension $d → ∞$
722724

723725
This balances:
724726

@@ -733,15 +735,15 @@ In practice, aim for:
733735

734736
If your acceptance rate is outside these ranges, adjust your proposal scale σ:
735737

736-
- Too high acceptance (>60%)? Increase σ
737-
- Too low acceptance (<15%)? Decrease σ
738+
- Too high acceptance (>60%)? Increase $σ$
739+
- Too low acceptance (<15%)? Decrease $σ$
738740
:::
739741

740742
**Adaptive tuning**: In practice, you might run the chain for a **burn-in** period, monitor the acceptance rate, and adjust $\sigma$. Common strategy:
741743

742744
- Run 1000 steps (burn-in)
743-
- If acceptance rate > 50%, multiply σ by 1.2
744-
- If acceptance rate < 20%, divide σ by 1.2
745+
- If acceptance rate > 50%, multiply $σ$ by 1.2
746+
- If acceptance rate < 20%, divide $σ$ by 1.2
745747
- Repeat until acceptance rate is in target range
746748

747749
Once tuned, **fix** the proposal and run your production chain. (Don't keep adapting during production — this violates the Markov property and detailed balance.)
@@ -836,7 +838,7 @@ def gaussian_proposal(theta, sigma=1.0):
836838

837839
### What About the Normalization Constant?
838840

839-
**Recall:** we only need \(\pi(\theta)\) up to proportionality; this section shows explicitly how \(p(D)\) cancels from the MH acceptance ratio.
841+
**Recall:** we only need $\pi(\theta)$ up to proportionality; this section shows explicitly how $p(D)$ cancels from the MH acceptance ratio.
840842

841843
Remember Bayes' theorem:
842844

@@ -881,7 +883,7 @@ This is the **burn-in problem**: Early samples are not from π. You need to disc
881883

882884
The first and most important diagnostic: **Plot your samples over time.**
883885

884-
A trace plot shows θ_t vs. t. What to look for:
886+
A trace plot shows $θ_t$ vs. $t$. What to look for:
885887

886888
**Good signs**:
887889

@@ -1053,14 +1055,14 @@ If $τ = 10$, then your $N=10000$ samples are only as informative as $N_\text{ef
10531055
:class: tip
10541056
**Target**: τ < 100 for most problems.
10551057

1056-
- τ = 1: Every sample is independent (ideal, rarely achieved)
1057-
- τ = 10-50: Good mixing, typical for well-tuned samplers
1058-
- τ = 100-500: Acceptable, but you'll need a long chain
1059-
- τ > 500: Poor mixing. Fix your proposal or increase step size.
1058+
- $τ = 1$: Every sample is independent (ideal, rarely achieved)
1059+
- $τ = 10-50$: Good mixing, typical for well-tuned samplers
1060+
- $τ = 100-500$: Acceptable, but you'll need a long chain
1061+
- $τ > 500$: Poor mixing. Fix your proposal or increase step size.
10601062

1061-
If τ is very large, you have two options:
1063+
If $τ$ is very large, you have two options:
10621064

1063-
1. Run the chain much longer (N = 100τ or more for reliable statistics)
1065+
1. Run the chain much longer ($N = 100τ$ or more for reliable statistics)
10641066
2. Improve your sampler (tune proposals, try HMC, etc.)
10651067
:::
10661068

@@ -1835,6 +1837,7 @@ Before moving to Project 4, let's address some frequent misunderstandings that t
18351837
**Misconception 7: "MCMC always works"**
18361838

18371839
**Reality**: MCMC can fail in many ways:
1840+
18381841
- Multimodal posteriors where chains get trapped in one mode
18391842
- Extremely high-dimensional problems where mixing is glacially slow
18401843
- Posteriors with complex geometry (funnel shapes, banana-shaped ridges)

0 commit comments

Comments
 (0)