You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 06-the-learnable-universe/module-3-machine-learning/02a-gp-theory.md
+38-18Lines changed: 38 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,14 @@ By the end of Part II, you will be able to:
23
23
24
24
---
25
25
26
+
```{admonition} Recommended Reading: Visual Exploration of Gaussian Processes
27
+
:class: tip
28
+
29
+
For an interactive visual introduction to GPs, see [**A Visual Exploration of Gaussian Processes**](https://distill.pub/2019/visual-exploration-gaussian-processes/) (Görtler et al., 2019, *Distill*). This outstanding article provides interactive visualizations of kernel functions, prior/posterior distributions, and hyperparameter effects. It complements the mathematical treatment below with visual intuition—highly recommended for building geometric understanding before diving into equations!
30
+
```
31
+
32
+
---
33
+
26
34
## The Big Picture: The Computational Crisis in Modern Astrophysics
27
35
28
36
### The Problem We're Solving
@@ -357,11 +365,13 @@ $$
357
365
- A GP says: "I don't know the exact function, but I have beliefs about what it looks like"
358
366
- Those beliefs are encoded in the kernel $k(\mathbf{x}, \mathbf{x}')$: "how similar should $f(\mathbf{x})$ and $f(\mathbf{x}')$ be?"
359
367
360
-
**[FIGURE 2.1: GP Prior Samples - How Lengthscale Controls Smoothness]**
:alt: GP prior samples showing lengthscale effects on function smoothness
371
+
:align: center
363
372
364
-
**Figure 2.1**: Random function samples from GP(0, k_SE) with different lengthscales demonstrate how ℓ controls function smoothness. **Top row**: Individual samples show that small ℓ = 0.1 produces highly wiggly (high-frequency) functions, while large ℓ = 1.0 produces smooth (low-frequency) functions. **Bottom row**: Prior confidence bands (±2σ) with correlation length visualization. The red arrows show the lengthscale ℓ—the distance over which function values remain correlated (correlation drops to ~60% at distance ℓ). **Key Insight**: Small lengthscales require dense training data to capture rapid variations; large lengthscales allow sparse sampling since the function varies slowly.
373
+
**Figure 2.1: GP Prior Samples - How Lengthscale Controls Smoothness**. Random function samples from GP(0, k_SE) with different lengthscales demonstrate how ℓ controls function smoothness. **Top row**: Individual samples show that small ℓ = 0.1 produces highly wiggly (high-frequency) functions, while large ℓ = 1.0 produces smooth (low-frequency) functions. **Bottom row**: Prior confidence bands (±2σ) with correlation length visualization. The red arrows show the lengthscale ℓ—the distance over which function values remain correlated (correlation drops to ~60% at distance ℓ). **Key Insight**: Small lengthscales require dense training data to capture rapid variations; large lengthscales allow sparse sampling since the function varies slowly.
374
+
```
365
375
366
376
:::{admonition} Why This Matters for Emulation
367
377
:class: tip
@@ -807,11 +817,14 @@ Now predict at $Q_* = 0.80$ (outside training range):
807
817
- ⚠️ Use uncertain predictions with caution (check physics plausibility)
808
818
- ❌ Avoid relying on extrapolation predictions for publication without validation
809
819
810
-
**[FIGURE 3.2: GP Uncertainty - Interpolation vs Extrapolation]**
820
+
```{figure} figures/fig_3_2_gp_uncertainty.png
821
+
:label: fig-gp-uncertainty
822
+
:alt: GP uncertainty showing confident interpolation and uncertain extrapolation
**Figure 3.2: GP Uncertainty - Interpolation vs Extrapolation**. GP posterior with training data at x ∈ {1, 3, 5} demonstrates automatic uncertainty quantification. **Blue mean line**: Predictive mean μ(x) interpolates smoothly between training points (black dots with white edges). **Shaded regions**: Inner blue band shows ±2σ epistemic (function) uncertainty; outer coral band shows ±2σ total (epistemic + noise) uncertainty. **Green arrows** (interpolation regions): Narrow uncertainty between training points where GP is confident. **Red arrows** (extrapolation regions): Wide uncertainty outside training range where GP warns "I don't know—don't trust me here!" **Key Insight**: GP uncertainty σ(x) automatically grows far from data, providing a principled warning system for when predictions become unreliable. This is the epistemic uncertainty that shrinks with more training data.
826
+
```
813
827
814
-
**Figure 3.2**: GP posterior with training data at x ∈ {1, 3, 5} demonstrates automatic uncertainty quantification. **Blue mean line**: Predictive mean μ(x) interpolates smoothly between training points (black dots with white edges). **Shaded regions**: Inner blue band shows ±2σ epistemic (function) uncertainty; outer coral band shows ±2σ total (epistemic + noise) uncertainty. **Green arrows** (interpolation regions): Narrow uncertainty between training points where GP is confident. **Red arrows** (extrapolation regions): Wide uncertainty outside training range where GP warns "I don't know—don't trust me here!" **Key Insight**: GP uncertainty σ(x) automatically grows far from data, providing a principled warning system for when predictions become unreliable. This is the epistemic uncertainty that shrinks with more training data.
815
828
:::
816
829
817
830
:::{admonition} Conceptual Checkpoint #3
@@ -928,11 +941,14 @@ Now each dimension has its own lengthscale $\ell_d$.
928
941
- ARD learns this automatically from data!
929
942
-**Bonus**: Tells you which parameters matter most (scientific discovery!)
:alt: ARD automatic parameter importance discovery for N-body simulations
947
+
:align: center
932
948
933
-

949
+
**Figure 4.3: ARD Effect - Automatic Parameter Importance Discovery**. ARD automatically discovers which parameters matter for N-body cluster evolution. **Left panel**: GP prediction with ARD lengthscales ℓ_Q = 0.3 (small) and ℓ_N = 2.0 (large). The **vertical contours** reveal that bound fraction is highly sensitive to virial ratio Q but weakly sensitive to particle number N. **Right panel**: True underlying function confirms ARD learned correctly. **Yellow box annotation**: Lengthscale ratio ℓ_N/ℓ_Q = 6.7× means the GP is ~7× more sensitive to Q than N—the GP automatically discovered from just 25 training points (red dots) that Q is the dominant physics parameter! **Key Insight**: ARD performs automatic feature selection by learning which input dimensions actually affect the output. Small ℓ_d → parameter d matters; large ℓ_d → parameter d is relatively unimportant. This is scientific discovery from data—no physics intuition required (though validating against physics is essential!).
950
+
```
934
951
935
-
**Figure 4.3**: ARD automatically discovers which parameters matter for N-body cluster evolution. **Left panel**: GP prediction with ARD lengthscales ℓ_Q = 0.3 (small) and ℓ_N = 2.0 (large). The **vertical contours** reveal that bound fraction is highly sensitive to virial ratio Q but weakly sensitive to particle number N. **Right panel**: True underlying function confirms ARD learned correctly. **Yellow box annotation**: Lengthscale ratio ℓ_N/ℓ_Q = 6.7× means the GP is ~7× more sensitive to Q than N—the GP automatically discovered from just 25 training points (red dots) that Q is the dominant physics parameter! **Key Insight**: ARD performs automatic feature selection by learning which input dimensions actually affect the output. Small ℓ_d → parameter d matters; large ℓ_d → parameter d is relatively unimportant. This is scientific discovery from data—no physics intuition required (though validating against physics is essential!).
936
952
:::
937
953
938
954
### The Matérn Family: More Realistic Smoothness
@@ -1010,11 +1026,13 @@ Consider emulating different cluster properties:
1010
1026
**The key**: Match kernel smoothness to your physical intuition. When uncertain, Matérn-5/2 is a good default.
:alt: Matérn family smoothness comparison showing differentiability controlled by nu
1032
+
:align: center
1016
1033
1017
-
**Figure 2.3**: Matérn family smoothness comparison showing how ν controls differentiability. **Top row**: Function samples f(x) for each smoothness parameter. **Middle row**: Numerical derivatives f'(x) reveal roughness—Matérn-1/2 (ν=0.5) shows visible kinks and is NOT differentiable (rough, discontinuous slopes); Matérn-3/2 (ν=1.5) has smooth first derivatives but rough second derivatives (once differentiable); Matérn-5/2 (ν=2.5) is very smooth (twice differentiable). **Bottom row**: Kernel correlation k(r) vs distance shows how quickly correlations decay. **Practical Recommendation**: Use **Matérn-5/2 as default** for physics emulation—smooth enough for realistic systems but more flexible than infinitely-smooth SE kernel. Only use SE when you KNOW the function is truly infinitely smooth (rare in real physics). Use Matérn-3/2 if validation shows underfitting or if you expect rougher behavior.
1034
+
**Figure 2.3: Matérn Smoothness Comparison**. Matérn family smoothness comparison showing how ν controls differentiability. **Top row**: Function samples f(x) for each smoothness parameter. **Middle row**: Numerical derivatives f'(x) reveal roughness—Matérn-1/2 (ν=0.5) shows visible kinks and is NOT differentiable (rough, discontinuous slopes); Matérn-3/2 (ν=1.5) has smooth first derivatives but rough second derivatives (once differentiable); Matérn-5/2 (ν=2.5) is very smooth (twice differentiable). **Bottom row**: Kernel correlation k(r) vs distance shows how quickly correlations decay. **Practical Recommendation**: Use **Matérn-5/2 as default** for physics emulation—smooth enough for realistic systems but more flexible than infinitely-smooth SE kernel. Only use SE when you KNOW the function is truly infinitely smooth (rare in real physics). Use Matérn-3/2 if validation shows underfitting or if you expect rougher behavior.
1035
+
```
1018
1036
1019
1037
### Periodic Kernels: Exploiting Symmetries
1020
1038
@@ -1038,11 +1056,13 @@ where:
1038
1056
1039
1057
**N-body example**: If you're emulating cluster properties in a rotating frame, periodic kernel might capture rotational symmetry. (Rare, but possible!)
:alt: Comprehensive kernel gallery comparing common GP kernels
1062
+
:align: center
1044
1063
1045
-
**Figure 2.2**: Comprehensive kernel gallery comparing five common GP kernels. **Top row**: Kernel correlation k(r) vs distance r—shows how correlation decays with separation. SE (RBF) has smooth Gaussian decay; Matérn-1/2 has exponential decay (roughest); Matérn-3/2 and 5/2 are intermediate; Periodic shows repeating pattern. **Middle row**: Random function samples demonstrate smoothness—SE is infinitely smooth (no kinks ever), Matérn-1/2 can have kinks (rough), Matérn-5/2 is very smooth but more realistic than SE, Periodic captures repeating patterns. **Bottom row**: Prior ±2σ confidence bands show expected function variability. **Key Comparisons**: SE (blue) is too smooth for most physics; Matérn-1/2 (purple) is too rough (kinks visible); Matérn-5/2 (red) balances smoothness with realism (**recommended default**); Periodic (green) for phenomena with known periodicity. All kernels share lengthscale ℓ=0.3 for fair comparison.
1064
+
**Figure 2.2: Kernel Gallery**. Comprehensive kernel gallery comparing five common GP kernels. **Top row**: Kernel correlation k(r) vs distance r—shows how correlation decays with separation. SE (RBF) has smooth Gaussian decay; Matérn-1/2 has exponential decay (roughest); Matérn-3/2 and 5/2 are intermediate; Periodic shows repeating pattern. **Middle row**: Random function samples demonstrate smoothness—SE is infinitely smooth (no kinks ever), Matérn-1/2 can have kinks (rough), Matérn-5/2 is very smooth but more realistic than SE, Periodic captures repeating patterns. **Bottom row**: Prior ±2σ confidence bands show expected function variability. **Key Comparisons**: SE (blue) is too smooth for most physics; Matérn-1/2 (purple) is too rough (kinks visible); Matérn-5/2 (red) balances smoothness with realism (**recommended default**); Periodic (green) for phenomena with known periodicity. All kernels share lengthscale ℓ=0.3 for fair comparison.
0 commit comments