Skip to content

Commit 2d7a51b

Browse files
committed
update scripts; remove results
1 parent f62ff91 commit 2d7a51b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+369
-1034
lines changed
Lines changed: 106 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,106 @@
1-
# Streamlined Auto-Marginalization Experiment Plan
2-
3-
## Core Validation: Essential Evidence
4-
**Goal**: Prove correctness and demonstrate core algorithmic advantages
5-
6-
1) **Correctness Validation**
7-
- HMM: K ∈ {2, 4}, T ∈ {50, 200} vs forward-backward reference
8-
- GMM: K ∈ {2, 4}, N ∈ {100, 1000} vs analytical collapsed likelihood
9-
- Gradient verification via finite differences
10-
- *Status: Partially implemented, needs expansion*
11-
12-
2) **Scaling Demonstration**
13-
- HMM temporal ordering: measure O(T·K²) scaling
14-
- Peak frontier width profiling during evaluation
15-
- *Status: Basic benchmarks exist, needs theoretical overlay*
16-
17-
## Critical Impact Demo: Order Matters
18-
**Goal**: Show dramatic practical importance of variable ordering
19-
20-
3) **Factorial HMM Order Comparison**
21-
- C ∈ {2, 3, 4} chains, T = 100
22-
- Compare policies: interleaved (time‑first) vs min‑fill vs min‑degree (weighted by log K), with randomized tie‑breaks and a few restarts; report best of R restarts per heuristic.
23-
- Replace “worst‑case” (grouped/random) with practical heuristics; avoid pathological explosions.
24-
- Metrics:
25-
- Frontier stats: max/mean/sum width over evaluation order
26-
- Predicted DP cost proxy: Σ_t K^{w_t} (or Σ_t exp(Σ_i log K_i) for heterogeneous K)
27-
- Timing: always time interleaved; time heuristic orders only if predicted cost < threshold (frontier‑only mode otherwise). Verify equal logp on small T.
28-
- Order construction: build discrete primal graph; generate elimination order via heuristic; lift to evaluation order by placing emissions as soon as all discrete parents are placed; topo‑repair; recompute minimal keys.
29-
- *Status: Heuristic plan defined; utils in place; implement min‑fill/min‑degree + frontier‑only reporting*
30-
31-
## Theoretical Generalization: Beyond Chains
32-
**Goal**: Demonstrate algorithmic generality
33-
34-
4) **Tree Structure Validation**
35-
- Binary HMT: DFS vs BFS vs random orders
36-
- Show near-optimal frontier management
37-
- *Status: Not implemented*
38-
39-
## Nonparametric Extension: Exact Finite Truncation
40-
**Goal**: Show method works for nonparametric models with finite support
41-
42-
5) **HDP-HMM with Truncation**
43-
- Stick-breaking with K_max ∈ {5, 10, 20}
44-
- Marginalize assignments exactly under truncation
45-
- Compare against forward-backward with same truncation
46-
- Demonstrate exact gradients w.r.t. hyperparameters
47-
- *Status: Not implemented*
48-
49-
## Implementation Priorities
50-
51-
**Core Validation**
52-
1. Extend existing HMM/GMM correctness tests with more configurations
53-
2. Add theoretical complexity curve overlays to existing benchmarks
54-
3. Implement gradient verification via finite differences
55-
56-
**Order Impact**
57-
4. Implement FHMM with heuristic orders (interleaved, min‑fill, min‑degree)
58-
5. Add frontier‑only mode with predicted cost proxy and timing threshold; time interleaved by default
59-
6. Add randomized restarts for heuristics with weighted scores (log K) and select the best
60-
61-
**Generalization**
62-
7. Add basic tree model (HMT) with heuristic orders (DFS‑like, min‑fill/min‑degree on tree moralization)
63-
8. Implement HDP-HMM with truncation and exact marginalization
64-
65-
**Infrastructure**
66-
- Extend existing experiment harness for structured logging
67-
- Add frontier profiling and complexity proxy generation (Σ K^{w_t})
68-
- Add skip/timeout guards based on predicted cost; record “skipped” in logs
1+
# Experiment Plan: Auto-Marginalization
2+
3+
Experiments validating automatic marginalization of discrete latent variables in JuliaBUGS.
4+
5+
## 1. Correctness
6+
7+
Validates marginalized log-probability against analytical references.
8+
9+
```bash
10+
# HMM
11+
AM_SWEEP_SEEDS=1,2,3 AM_SWEEP_K=2,4,8,16 AM_SWEEP_T=50,100,200,400 \
12+
julia --project=JuliaBUGS/experiments scripts/hmm_correctness_sweep.jl
13+
14+
# GMM
15+
AG_SWEEP_SEEDS=1,2,3 AG_SWEEP_K=2,4,8 AG_SWEEP_N=100,500,1000,5000 \
16+
julia --project=JuliaBUGS/experiments scripts/gmm_correctness_sweep.jl
17+
18+
# HDP-HMM (sticky, κ=0)
19+
AHDPC_SEEDS=1,2 AHDPC_K=5,10,20 AHDPC_T=50,100,200,400 AHDPC_KAPPA=0.0 \
20+
julia --project=JuliaBUGS/experiments scripts/hdphmm_correctness.jl
21+
22+
# HDP-HMM (sticky, κ=5)
23+
AHDPC_SEEDS=1,2 AHDPC_K=5,10,20 AHDPC_T=50,100,200,400 AHDPC_KAPPA=5.0 \
24+
julia --project=JuliaBUGS/experiments scripts/hdphmm_correctness.jl
25+
```
26+
27+
## 2. Gradients
28+
29+
Validates automatic differentiation against finite differences.
30+
31+
```bash
32+
# HMM
33+
AGC_SWEEP_SEEDS=1,2,3 AGC_SWEEP_K=2,4,8 AGC_SWEEP_T=50,100,200 \
34+
julia --project=JuliaBUGS/experiments scripts/hmm_gradient_check.jl
35+
36+
# GMM
37+
AGG_SWEEP_SEEDS=1,2,3 AGG_SWEEP_K=2,4,8 AGG_SWEEP_N=200,500,1000 \
38+
julia --project=JuliaBUGS/experiments scripts/gmm_gradient_check.jl
39+
40+
# HDP-HMM (sticky, κ=0)
41+
AHDPG_SWEEP_SEEDS=1,2 AHDPG_SWEEP_K=5,10,20 AHDPG_SWEEP_T=100,200 AHDPG_KAPPA=0.0 \
42+
julia --project=JuliaBUGS/experiments scripts/hdphmm_gradient_check.jl
43+
44+
# HDP-HMM (sticky, κ=5)
45+
AHDPG_SWEEP_SEEDS=1,2 AHDPG_SWEEP_K=5,10,20 AHDPG_SWEEP_T=100,200 AHDPG_KAPPA=5.0 \
46+
julia --project=JuliaBUGS/experiments scripts/hdphmm_gradient_check.jl
47+
```
48+
49+
## 3. Scaling
50+
51+
Benchmarks runtime vs problem size.
52+
53+
```bash
54+
# HMM
55+
AS_SWEEP_K=8,16,32,64,128,256,512 AS_SWEEP_T=50,100,200,400,800 \
56+
julia --project=JuliaBUGS/experiments scripts/hmm_scaling_bench.jl
57+
```
58+
59+
## 4. Variable Ordering: FHMM
60+
61+
Compares elimination orders (interleaved, states_then_y, min_fill, min_degree).
62+
63+
```bash
64+
# Small configs with timing
65+
AFH_C=2 AFH_K=2 AFH_T=5 AFH_MODE=timed AFH_ORDERS=interleaved,states_then_y \
66+
julia --project=JuliaBUGS/experiments scripts/fhmm_order_comparison.jl
67+
AFH_C=2 AFH_K=4 AFH_T=10 AFH_MODE=timed AFH_ORDERS=interleaved,states_then_y \
68+
julia --project=JuliaBUGS/experiments scripts/fhmm_order_comparison.jl
69+
70+
# Larger configs (frontier only)
71+
AFH_C=2 AFH_K=4 AFH_T=50 AFH_MODE=frontier AFH_ORDERS=interleaved,states_then_y,min_fill,min_degree \
72+
julia --project=JuliaBUGS/experiments scripts/fhmm_order_comparison.jl
73+
AFH_C=3 AFH_K=4 AFH_T=50 AFH_MODE=frontier AFH_ORDERS=interleaved,states_then_y,min_fill,min_degree \
74+
julia --project=JuliaBUGS/experiments scripts/fhmm_order_comparison.jl
75+
AFH_C=4 AFH_K=4 AFH_T=50 AFH_MODE=frontier AFH_ORDERS=interleaved,states_then_y,min_fill,min_degree \
76+
julia --project=JuliaBUGS/experiments scripts/fhmm_order_comparison.jl
77+
```
78+
79+
## 5. Variable Ordering: HMT
80+
81+
Compares tree traversal orders (dfs, bfs, random_dfs, min_fill, min_degree).
82+
83+
```bash
84+
# Varying depth
85+
AHMT_B=2 AHMT_K=4 AHMT_DEPTH=4 AHMT_MODE=frontier \
86+
julia --project=JuliaBUGS/experiments scripts/hmt_order_comparison.jl
87+
AHMT_B=2 AHMT_K=4 AHMT_DEPTH=6 AHMT_MODE=frontier \
88+
julia --project=JuliaBUGS/experiments scripts/hmt_order_comparison.jl
89+
AHMT_B=2 AHMT_K=4 AHMT_DEPTH=8 AHMT_MODE=frontier \
90+
julia --project=JuliaBUGS/experiments scripts/hmt_order_comparison.jl
91+
AHMT_B=2 AHMT_K=4 AHMT_DEPTH=10 AHMT_MODE=frontier \
92+
julia --project=JuliaBUGS/experiments scripts/hmt_order_comparison.jl
93+
94+
# Varying branching and states
95+
AHMT_B=2 AHMT_K=2 AHMT_DEPTH=6 AHMT_MODE=frontier \
96+
julia --project=JuliaBUGS/experiments scripts/hmt_order_comparison.jl
97+
AHMT_B=3 AHMT_K=2 AHMT_DEPTH=6 AHMT_MODE=frontier \
98+
julia --project=JuliaBUGS/experiments scripts/hmt_order_comparison.jl
99+
```
100+
101+
## Notes
102+
103+
- **Ordering matters**: Good elimination orders (e.g., interleaved for HMMs) keep frontier width ≈ O(1), achieving O(K·T) cost. Bad orders (e.g., states-first) explode to O(K^T).
104+
- **Heuristics**: Min-fill and min-degree with randomized tie-breaking (3 restarts) find good orders for arbitrary graphical models.
105+
- **HDP-HMM**: Both correctness and gradient scripts use the sticky HDP-HMM formulation with kappa (κ) parameter. Set AHDPC_KAPPA/AHDPG_KAPPA to control sticky self-transition bias. κ=0 is standard HDP-HMM, κ>0 adds self-transition preference.
106+
- **Output**: All scripts write CSV to stdout. Redirect as needed: `> results/output.csv`

0 commit comments

Comments
 (0)