Skip to content

Commit 1842fff

Browse files
committed
stage 5 done
1 parent 59b78a9 commit 1842fff

File tree

2 files changed

+57
-0
lines changed

2 files changed

+57
-0
lines changed

.DS_Store

0 Bytes
Binary file not shown.

reports/TIER1_FREEZE.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Milestone 5 — Smoke Benchmark Freeze ✅
2+
3+
**Goal:**
4+
Establish a minimal working benchmark for DeFi domain with local verification + shim fallback. This milestone locks in a reproducible baseline before expanding to integration benchmarks.
5+
6+
---
7+
8+
## Acceptance Criteria
9+
- Local domain verification (`domains/defi/`) is preferred; shim fallback is available if ngeodesic not present.
10+
- `rails_shim` returns stable reasons (`local:verified`, `shim:accept:stage-N`, or `ltv`).
11+
- Smoke benchmark **passes with ok accuracy ≥ 0.66**.
12+
13+
---
14+
15+
## Commands
16+
17+
### Run Smoke Benchmark
18+
```bash
19+
MICROLM_DISABLE_RAILS=1 micro-lm-bench defi \
20+
--file benches/defi_smoke.jsonl \
21+
--out .artifacts/defi_smoke_results.jsonl \
22+
--summary-out .artifacts/defi_smoke_summary.json \
23+
--gate-metric ok_acc --gate-min 0.66
24+
```
25+
26+
### Verify Summary
27+
```bash
28+
python3 - <<'PY'
29+
import json, sys
30+
s = json.load(open(".artifacts/defi_smoke_summary.json"))
31+
acc = s.get("ok_acc", 0.0)
32+
print(f"[bench] ok={s['ok']} total={s['total']} acc={acc:.2f}")
33+
sys.exit(0 if acc >= 0.66 else 1)
34+
PY
35+
```
36+
37+
---
38+
39+
## Results (Frozen)
40+
```
41+
total: 3
42+
ok: 2
43+
ok_acc: 0.667
44+
label_acc: 1.0
45+
expect_ok_acc: 1.0
46+
exact_acc: 1.0
47+
```
48+
49+
**Gate:** `ok_acc ≥ 0.66`
50+
**Status:** PASS ✅
51+
52+
---
53+
54+
## Notes
55+
- Stage 5 is the **freeze point**. No further changes should regress below this benchmark.
56+
- This milestone sets the baseline for Stage 6 integration benchmarks.
57+
- Artifacts (`.artifacts/defi_smoke_results.jsonl`, `.artifacts/defi_smoke_summary.json`) are reference outputs.

0 commit comments

Comments
 (0)