Skip to content

Commit bcf5079

Browse files
drannarosenclaude
andcommitted
Fix figure rendering in GP theory doc with MyST directives
Converted all figure references from markdown syntax to proper MyST {figure} directives for correct rendering on course website. **Changes:** - Added Distill GP visualization article reference at document start - Converted 5 figures to MyST {figure} directive syntax: - Figure 2.1: GP Prior Samples (lengthscale effects) - Figure 2.2: Kernel Gallery (comprehensive comparison) - Figure 2.3: Matérn Smoothness (differentiability comparison) - Figure 3.2: GP Uncertainty (interpolation vs extrapolation) - Figure 4.3: ARD Effect (parameter importance discovery) - All figures now include proper labels, alt text, alignment, and captions - Figure 1.1 (Mermaid diagram) kept as-is - native MyST support Fixes figure display issues on MyST-based course website. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 274aae7 commit bcf5079

File tree

1 file changed

+38
-18
lines changed

1 file changed

+38
-18
lines changed

06-the-learnable-universe/module-3-machine-learning/02a-gp-theory.md

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@ By the end of Part II, you will be able to:
2323

2424
---
2525

26+
```{admonition} Recommended Reading: Visual Exploration of Gaussian Processes
27+
:class: tip
28+
29+
For an interactive visual introduction to GPs, see [**A Visual Exploration of Gaussian Processes**](https://distill.pub/2019/visual-exploration-gaussian-processes/) (Görtler et al., 2019, *Distill*). This outstanding article provides interactive visualizations of kernel functions, prior/posterior distributions, and hyperparameter effects. It complements the mathematical treatment below with visual intuition—highly recommended for building geometric understanding before diving into equations!
30+
```
31+
32+
---
33+
2634
## The Big Picture: The Computational Crisis in Modern Astrophysics
2735

2836
### The Problem We're Solving
@@ -357,11 +365,13 @@ $$
357365
- A GP says: "I don't know the exact function, but I have beliefs about what it looks like"
358366
- Those beliefs are encoded in the kernel $k(\mathbf{x}, \mathbf{x}')$: "how similar should $f(\mathbf{x})$ and $f(\mathbf{x}')$ be?"
359367

360-
**[FIGURE 2.1: GP Prior Samples - How Lengthscale Controls Smoothness]**
361-
362-
![GP Prior Samples](figures/fig_2_1_gp_prior_samples.png)
368+
```{figure} figures/fig_2_1_gp_prior_samples.png
369+
:label: fig-gp-prior-samples
370+
:alt: GP prior samples showing lengthscale effects on function smoothness
371+
:align: center
363372
364-
**Figure 2.1**: Random function samples from GP(0, k_SE) with different lengthscales demonstrate how ℓ controls function smoothness. **Top row**: Individual samples show that small ℓ = 0.1 produces highly wiggly (high-frequency) functions, while large ℓ = 1.0 produces smooth (low-frequency) functions. **Bottom row**: Prior confidence bands (±2σ) with correlation length visualization. The red arrows show the lengthscale ℓ—the distance over which function values remain correlated (correlation drops to ~60% at distance ℓ). **Key Insight**: Small lengthscales require dense training data to capture rapid variations; large lengthscales allow sparse sampling since the function varies slowly.
373+
**Figure 2.1: GP Prior Samples - How Lengthscale Controls Smoothness**. Random function samples from GP(0, k_SE) with different lengthscales demonstrate how ℓ controls function smoothness. **Top row**: Individual samples show that small ℓ = 0.1 produces highly wiggly (high-frequency) functions, while large ℓ = 1.0 produces smooth (low-frequency) functions. **Bottom row**: Prior confidence bands (±2σ) with correlation length visualization. The red arrows show the lengthscale ℓ—the distance over which function values remain correlated (correlation drops to ~60% at distance ℓ). **Key Insight**: Small lengthscales require dense training data to capture rapid variations; large lengthscales allow sparse sampling since the function varies slowly.
374+
```
365375

366376
:::{admonition} Why This Matters for Emulation
367377
:class: tip
@@ -807,11 +817,14 @@ Now predict at $Q_* = 0.80$ (outside training range):
807817
- ⚠️ Use uncertain predictions with caution (check physics plausibility)
808818
- ❌ Avoid relying on extrapolation predictions for publication without validation
809819

810-
**[FIGURE 3.2: GP Uncertainty - Interpolation vs Extrapolation]**
820+
```{figure} figures/fig_3_2_gp_uncertainty.png
821+
:label: fig-gp-uncertainty
822+
:alt: GP uncertainty showing confident interpolation and uncertain extrapolation
823+
:align: center
811824
812-
![GP Uncertainty](figures/fig_3_2_gp_uncertainty.png)
825+
**Figure 3.2: GP Uncertainty - Interpolation vs Extrapolation**. GP posterior with training data at x ∈ {1, 3, 5} demonstrates automatic uncertainty quantification. **Blue mean line**: Predictive mean μ(x) interpolates smoothly between training points (black dots with white edges). **Shaded regions**: Inner blue band shows ±2σ epistemic (function) uncertainty; outer coral band shows ±2σ total (epistemic + noise) uncertainty. **Green arrows** (interpolation regions): Narrow uncertainty between training points where GP is confident. **Red arrows** (extrapolation regions): Wide uncertainty outside training range where GP warns "I don't know—don't trust me here!" **Key Insight**: GP uncertainty σ(x) automatically grows far from data, providing a principled warning system for when predictions become unreliable. This is the epistemic uncertainty that shrinks with more training data.
826+
```
813827

814-
**Figure 3.2**: GP posterior with training data at x ∈ {1, 3, 5} demonstrates automatic uncertainty quantification. **Blue mean line**: Predictive mean μ(x) interpolates smoothly between training points (black dots with white edges). **Shaded regions**: Inner blue band shows ±2σ epistemic (function) uncertainty; outer coral band shows ±2σ total (epistemic + noise) uncertainty. **Green arrows** (interpolation regions): Narrow uncertainty between training points where GP is confident. **Red arrows** (extrapolation regions): Wide uncertainty outside training range where GP warns "I don't know—don't trust me here!" **Key Insight**: GP uncertainty σ(x) automatically grows far from data, providing a principled warning system for when predictions become unreliable. This is the epistemic uncertainty that shrinks with more training data.
815828
:::
816829

817830
:::{admonition} Conceptual Checkpoint #3
@@ -928,11 +941,14 @@ Now each dimension has its own lengthscale $\ell_d$.
928941
- ARD learns this automatically from data!
929942
- **Bonus**: Tells you which parameters matter most (scientific discovery!)
930943

931-
**[FIGURE 4.3: ARD Effect - Automatic Parameter Importance Discovery]**
944+
```{figure} figures/fig_4_3_ard_effect.png
945+
:label: fig-ard-effect
946+
:alt: ARD automatic parameter importance discovery for N-body simulations
947+
:align: center
932948
933-
![ARD Effect](figures/fig_4_3_ard_effect.png)
949+
**Figure 4.3: ARD Effect - Automatic Parameter Importance Discovery**. ARD automatically discovers which parameters matter for N-body cluster evolution. **Left panel**: GP prediction with ARD lengthscales ℓ_Q = 0.3 (small) and ℓ_N = 2.0 (large). The **vertical contours** reveal that bound fraction is highly sensitive to virial ratio Q but weakly sensitive to particle number N. **Right panel**: True underlying function confirms ARD learned correctly. **Yellow box annotation**: Lengthscale ratio ℓ_N/ℓ_Q = 6.7× means the GP is ~7× more sensitive to Q than N—the GP automatically discovered from just 25 training points (red dots) that Q is the dominant physics parameter! **Key Insight**: ARD performs automatic feature selection by learning which input dimensions actually affect the output. Small ℓ_d → parameter d matters; large ℓ_d → parameter d is relatively unimportant. This is scientific discovery from data—no physics intuition required (though validating against physics is essential!).
950+
```
934951

935-
**Figure 4.3**: ARD automatically discovers which parameters matter for N-body cluster evolution. **Left panel**: GP prediction with ARD lengthscales ℓ_Q = 0.3 (small) and ℓ_N = 2.0 (large). The **vertical contours** reveal that bound fraction is highly sensitive to virial ratio Q but weakly sensitive to particle number N. **Right panel**: True underlying function confirms ARD learned correctly. **Yellow box annotation**: Lengthscale ratio ℓ_N/ℓ_Q = 6.7× means the GP is ~7× more sensitive to Q than N—the GP automatically discovered from just 25 training points (red dots) that Q is the dominant physics parameter! **Key Insight**: ARD performs automatic feature selection by learning which input dimensions actually affect the output. Small ℓ_d → parameter d matters; large ℓ_d → parameter d is relatively unimportant. This is scientific discovery from data—no physics intuition required (though validating against physics is essential!).
936952
:::
937953

938954
### The Matérn Family: More Realistic Smoothness
@@ -1010,11 +1026,13 @@ Consider emulating different cluster properties:
10101026
**The key**: Match kernel smoothness to your physical intuition. When uncertain, Matérn-5/2 is a good default.
10111027
:::
10121028

1013-
**[FIGURE 2.3: Matérn Smoothness Comparison]**
1014-
1015-
![Matérn Smoothness](figures/fig_2_3_matern_smoothness.png)
1029+
```{figure} figures/fig_2_3_matern_smoothness.png
1030+
:label: fig-matern-smoothness
1031+
:alt: Matérn family smoothness comparison showing differentiability controlled by nu
1032+
:align: center
10161033
1017-
**Figure 2.3**: Matérn family smoothness comparison showing how ν controls differentiability. **Top row**: Function samples f(x) for each smoothness parameter. **Middle row**: Numerical derivatives f'(x) reveal roughness—Matérn-1/2 (ν=0.5) shows visible kinks and is NOT differentiable (rough, discontinuous slopes); Matérn-3/2 (ν=1.5) has smooth first derivatives but rough second derivatives (once differentiable); Matérn-5/2 (ν=2.5) is very smooth (twice differentiable). **Bottom row**: Kernel correlation k(r) vs distance shows how quickly correlations decay. **Practical Recommendation**: Use **Matérn-5/2 as default** for physics emulation—smooth enough for realistic systems but more flexible than infinitely-smooth SE kernel. Only use SE when you KNOW the function is truly infinitely smooth (rare in real physics). Use Matérn-3/2 if validation shows underfitting or if you expect rougher behavior.
1034+
**Figure 2.3: Matérn Smoothness Comparison**. Matérn family smoothness comparison showing how ν controls differentiability. **Top row**: Function samples f(x) for each smoothness parameter. **Middle row**: Numerical derivatives f'(x) reveal roughness—Matérn-1/2 (ν=0.5) shows visible kinks and is NOT differentiable (rough, discontinuous slopes); Matérn-3/2 (ν=1.5) has smooth first derivatives but rough second derivatives (once differentiable); Matérn-5/2 (ν=2.5) is very smooth (twice differentiable). **Bottom row**: Kernel correlation k(r) vs distance shows how quickly correlations decay. **Practical Recommendation**: Use **Matérn-5/2 as default** for physics emulation—smooth enough for realistic systems but more flexible than infinitely-smooth SE kernel. Only use SE when you KNOW the function is truly infinitely smooth (rare in real physics). Use Matérn-3/2 if validation shows underfitting or if you expect rougher behavior.
1035+
```
10181036

10191037
### Periodic Kernels: Exploiting Symmetries
10201038

@@ -1038,11 +1056,13 @@ where:
10381056

10391057
**N-body example**: If you're emulating cluster properties in a rotating frame, periodic kernel might capture rotational symmetry. (Rare, but possible!)
10401058

1041-
**[FIGURE 2.2: Kernel Gallery]**
1042-
1043-
![Kernel Gallery](figures/fig_2_2_kernel_gallery.png)
1059+
```{figure} figures/fig_2_2_kernel_gallery.png
1060+
:label: fig-kernel-gallery
1061+
:alt: Comprehensive kernel gallery comparing common GP kernels
1062+
:align: center
10441063
1045-
**Figure 2.2**: Comprehensive kernel gallery comparing five common GP kernels. **Top row**: Kernel correlation k(r) vs distance r—shows how correlation decays with separation. SE (RBF) has smooth Gaussian decay; Matérn-1/2 has exponential decay (roughest); Matérn-3/2 and 5/2 are intermediate; Periodic shows repeating pattern. **Middle row**: Random function samples demonstrate smoothness—SE is infinitely smooth (no kinks ever), Matérn-1/2 can have kinks (rough), Matérn-5/2 is very smooth but more realistic than SE, Periodic captures repeating patterns. **Bottom row**: Prior ±2σ confidence bands show expected function variability. **Key Comparisons**: SE (blue) is too smooth for most physics; Matérn-1/2 (purple) is too rough (kinks visible); Matérn-5/2 (red) balances smoothness with realism (**recommended default**); Periodic (green) for phenomena with known periodicity. All kernels share lengthscale ℓ=0.3 for fair comparison.
1064+
**Figure 2.2: Kernel Gallery**. Comprehensive kernel gallery comparing five common GP kernels. **Top row**: Kernel correlation k(r) vs distance r—shows how correlation decays with separation. SE (RBF) has smooth Gaussian decay; Matérn-1/2 has exponential decay (roughest); Matérn-3/2 and 5/2 are intermediate; Periodic shows repeating pattern. **Middle row**: Random function samples demonstrate smoothness—SE is infinitely smooth (no kinks ever), Matérn-1/2 can have kinks (rough), Matérn-5/2 is very smooth but more realistic than SE, Periodic captures repeating patterns. **Bottom row**: Prior ±2σ confidence bands show expected function variability. **Key Comparisons**: SE (blue) is too smooth for most physics; Matérn-1/2 (purple) is too rough (kinks visible); Matérn-5/2 (red) balances smoothness with realism (**recommended default**); Periodic (green) for phenomena with known periodicity. All kernels share lengthscale ℓ=0.3 for fair comparison.
1065+
```
10461066

10471067
### Compositional Kernels: Building Complexity
10481068

0 commit comments

Comments
 (0)