Skip to content

Commit 66de6e9

Browse files
authored
Merge pull request #135 from skel84/feat/evaluate-snapshot-file
docs: record snapshot file seam evaluation
2 parents 18bef24 + f78d96a commit 66de6e9

File tree

4 files changed

+134
-3
lines changed

4 files changed

+134
-3
lines changed

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
- [Reservation Engine Semantics](./reservation-semantics.md)
2929
- [Reservation Runtime Seam Evaluation](./reservation-runtime-seam-evaluation.md)
3030
- [Runtime Extraction Roadmap](./runtime-extraction-roadmap.md)
31+
- [Snapshot File Seam Evaluation](./snapshot-file-seam-evaluation.md)
3132
- [Revoke Safety Slice](./revoke-safety-slice.md)
3233
- [Operator Runbook](./operator-runbook.md)
3334
- [KubeVirt Jepsen Report](./kubevirt-jepsen-report.md)

docs/runtime-extraction-roadmap.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ Scope:
4242
- `retire_queue`
4343
- `wal`
4444
- `wal_file`
45-
- `snapshot_file` only if the file-level discipline stays separable from snapshot schemas
45+
- evaluate `snapshot_file`, but extract it only if the file-level discipline stays separable from
46+
snapshot schemas across the full engine family
4647

4748
Non-goals:
4849

@@ -140,6 +141,13 @@ Do this next:
140141
3. `M12-T03` shared `wal_file`
141142
4. only then decide whether `snapshot_file` is still clean enough to extract
142143

144+
Result:
145+
146+
- `retire_queue`, `wal`, and `wal_file` were extracted successfully
147+
- `snapshot_file` was evaluated and deferred because the seam is still only clean inside the
148+
`quota-core` / `reservation-core` pair, not across all three engines
149+
- the next correct move is now `M13`
150+
143151
Do not do this next:
144152

145153
- public framework branding
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Snapshot File Seam Evaluation
2+
3+
## Purpose
4+
5+
This document closes `M12-T04` by evaluating whether `snapshot_file` is now clean enough to
6+
extract as another shared internal runtime module after:
7+
8+
- shared `retire_queue`
9+
- shared `wal`
10+
- shared `wal_file`
11+
12+
The question is deliberately narrower than "can snapshot persistence be shared in theory?" The
13+
question is whether the current three-engine code on `main` justifies a real extraction now.
14+
15+
## Decision
16+
17+
Do not extract a shared `snapshot_file` crate yet.
18+
19+
The seam is real only inside the smaller `quota-core` / `reservation-core` pair. It is not yet a
20+
clean three-engine runtime boundary.
21+
22+
The correct outcome for this slice is:
23+
24+
- record that `snapshot_file` is not ready for extraction
25+
- keep each engine's `snapshot_file` local
26+
- move on to `M13`, the internal engine authoring boundary
27+
28+
## What Is Shared
29+
30+
All three engines share the same high-level persistence discipline:
31+
32+
- one snapshot file per engine
33+
- temp-file write, sync, rename, and parent-directory sync
34+
- snapshot bytes loaded before WAL replay
35+
- fail-closed behavior on decode or integrity errors
36+
37+
That means there is still real family resemblance at the discipline level.
38+
39+
## Where The Seam Breaks
40+
41+
### `allocdb-core` uses a simpler file format
42+
43+
`allocdb-core` still stores only encoded snapshot bytes:
44+
45+
- no footer
46+
- no checksum
47+
- no explicit max-bytes bound
48+
- decode-time corruption detection only
49+
50+
That is materially different from the newer engines.
51+
52+
### `quota-core` and `reservation-core` share a stronger format
53+
54+
`quota-core` and `reservation-core` both use the same stronger file-level discipline:
55+
56+
- footer magic
57+
- persisted payload length
58+
- CRC32C checksum
59+
- explicit `max_snapshot_bytes`
60+
- oversize rejection before decode
61+
62+
Those two modules are close enough to share helpers later, but that is not the same thing as a
63+
repository-wide extraction candidate.
64+
65+
### The remaining commonality is below the current file wrapper
66+
67+
The shared part is mostly:
68+
69+
- temp-file naming
70+
- write, sync, rename, and parent-directory sync
71+
- footer read/write mechanics for the newer engines
72+
73+
But the live module boundary still mixes those mechanics with engine-specific constructor and error
74+
surface choices:
75+
76+
- `allocdb-core` has no size-bound constructor argument
77+
- `quota-core` and `reservation-core` expose integrity-specific error variants
78+
- the three wrappers are still tied to engine-local snapshot schemas and recovery expectations
79+
80+
That makes a forced crate extraction likely to create awkward generic plumbing rather than reduce
81+
maintenance cost.
82+
83+
## Why Extraction Is Premature
84+
85+
Extracting now would create a misleading shared layer:
86+
87+
- it would either erase the real allocdb-vs-quota/reservation format difference
88+
- or it would introduce configuration branches that mostly exist to paper over that difference
89+
90+
That is the wrong direction for this roadmap. `M12` is about extracting only what is already
91+
mechanically shared, not about normalizing divergent modules by force.
92+
93+
The current evidence supports:
94+
95+
- shared `retire_queue`
96+
- shared `wal`
97+
- shared `wal_file`
98+
99+
It does not yet support:
100+
101+
- shared `snapshot_file`
102+
103+
## What Would Change The Answer Later
104+
105+
Revisit this seam only if one of these becomes true:
106+
107+
- `allocdb-core` adopts the same footer/checksum/max-bytes discipline as the newer engines
108+
- repeated snapshot-file fixes land independently in multiple engines
109+
- a later authoring pass shows the snapshot-file helper boundary can stay below engine-local error
110+
and schema surfaces
111+
112+
Until then, local duplication is still cheaper than a fake shared abstraction.
113+
114+
## Recommended Next Step
115+
116+
Treat `M12` as complete after this readout.
117+
118+
The next step is `M13`, not more extraction pressure:
119+
120+
1. define the internal engine authoring boundary
121+
2. write the runtime-vs-engine contract
122+
3. reassess whether a fourth-engine or reduced-copy proof is still required

docs/status.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -216,5 +216,5 @@
216216
`lease_safety-control` and full `1800s` `lease_safety-crash-restart` evidence on `allocdb-a` with `blockers=0`
217217
- the next recommended step remains downstream real-cluster e2e work such as `gpu_control_plane`, not more unplanned lease-kernel semantics work; the current deployment slice covers a first in-cluster `StatefulSet` shape, but bootstrap-primary routing, failover/rejoin orchestration, and background maintenance remain operator work, and the current staging unblock path is to publish `skel84/allocdb` from GitHub Actions rather than relying on the local Docker engine
218218
- PR `#107` merged the `M10` quota-engine proof on `main`, and PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: the repository now has a second and third deterministic engine with bounded command sets, logical-slot refill/expiry, and snapshot/WAL recovery proofs
219-
- the `M10-T05` and `M11-T05` readouts still defer broad shared-runtime extraction: `retire_queue` is the first justified internal extraction candidate, while `wal`, `wal_file`, and `snapshot_file` remain the next likely seams only after that micro-extraction lands
220-
- the next roadmap is now explicit in `runtime-extraction-roadmap.md`: start with `retire_queue`, then `wal`, then `wal_file`, and only then decide whether `snapshot_file` is still clean enough to extract before defining the internal authoring contract and asking for a fourth-engine or reduced-copy proof
219+
- PRs `#132`, `#133`, and `#134` merged the first `M12` runtime extractions on `main`: `retire_queue`, `wal`, and `wal_file` are now shared internal substrate instead of copied engine-local modules, while `M12-T04` closed as a defer decision because `snapshot_file` is still only a clean seam inside the `quota-core` / `reservation-core` pair and `allocdb-core` keeps the simpler file format
220+
- the next roadmap step is now `M13`: define the internal engine authoring boundary in `runtime-extraction-roadmap.md` and stop extraction pressure until that contract is written down

0 commit comments

Comments
 (0)