Skip to content

Commit d65af3f

Browse files
authored
docs: add runtime extraction roadmap
1 parent c664964 commit d65af3f

File tree

3 files changed

+155
-6
lines changed

3 files changed

+155
-6
lines changed

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
- [Reservation Engine Plan](./reservation-engine-plan.md)
2828
- [Reservation Engine Semantics](./reservation-semantics.md)
2929
- [Reservation Runtime Seam Evaluation](./reservation-runtime-seam-evaluation.md)
30+
- [Runtime Extraction Roadmap](./runtime-extraction-roadmap.md)
3031
- [Revoke Safety Slice](./revoke-safety-slice.md)
3132
- [Operator Runbook](./operator-runbook.md)
3233
- [KubeVirt Jepsen Report](./kubevirt-jepsen-report.md)

docs/runtime-extraction-roadmap.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Runtime Extraction Roadmap
2+
3+
## Purpose
4+
5+
This document defines the path from the current engine family to something that can honestly be
6+
called a general internal DB-building library.
7+
8+
The current state is:
9+
10+
- `allocdb-core`, `quota-core`, and `reservation-core` all exist on `main`
11+
- the engine thesis is proven strongly enough
12+
- a broad shared runtime is still premature
13+
- `retire_queue` is the first justified micro-extraction candidate
14+
15+
The goal is not to market a framework early. The goal is to extract only the runtime substrate that
16+
has actually stabilized under multiple engines.
17+
18+
## End State
19+
20+
We should only call this a general internal DB-building library when all of the following are true:
21+
22+
- more than one runtime module is shared cleanly across engines
23+
- the shared-vs-domain boundary is explicit and stable
24+
- a new engine or engine slice can be built with materially less copy-paste
25+
- extraction reduces maintenance cost more than it adds abstraction cost
26+
27+
Until then, the honest description remains:
28+
29+
- multiple deterministic engines
30+
- emerging shared runtime
31+
32+
## Milestone Shape
33+
34+
### M12: First Internal Runtime Extractions
35+
36+
Goal:
37+
38+
- extract the smallest runtime pieces that are already mechanically shared
39+
40+
Scope:
41+
42+
- `retire_queue`
43+
- `wal`
44+
- `wal_file`
45+
- `snapshot_file` only if the file-level discipline stays separable from snapshot schemas
46+
47+
Non-goals:
48+
49+
- no public framework story
50+
- no snapshot schema extraction
51+
- no recovery API extraction
52+
- no state-machine trait layer
53+
54+
Exit criteria:
55+
56+
- extracted modules are used by all applicable engines
57+
- behavior is unchanged
58+
- tests stay green without new abstraction leaks
59+
60+
### M13: Internal Engine Authoring Contract
61+
62+
Goal:
63+
64+
- define the stable boundary between shared runtime and engine-local semantics
65+
66+
Scope:
67+
68+
- one internal runtime contract note
69+
- explicit ownership of:
70+
- bounded collections
71+
- durable frame/file helpers
72+
- snapshot-file discipline
73+
- recovery helper seams, if any
74+
- explicit non-ownership of:
75+
- command schemas
76+
- result surfaces
77+
- snapshot schemas
78+
- state-machine semantics
79+
80+
Exit criteria:
81+
82+
- the contract is clear enough that another engine authoring pass is constrained by it
83+
84+
### M14: Fourth-Engine Or Reduced-Copy Proof
85+
86+
Goal:
87+
88+
- prove that the extracted substrate lowers authoring cost rather than only moving code around
89+
90+
Acceptable proof shapes:
91+
92+
- build a fourth engine against the extracted substrate, or
93+
- retrofit one substantial new engine slice against the extracted substrate with clearly reduced
94+
copy-paste and no correctness regression
95+
96+
Exit criteria:
97+
98+
- one new engine or engine slice uses the extracted substrate directly
99+
- the reduction in duplicated runtime code is obvious
100+
- the authoring contract survives contact with real implementation work
101+
102+
## Recommended Issue Shape
103+
104+
### M12
105+
106+
- `M12`: Extract the first internal shared runtime substrate from the three-engine family
107+
- `M12-T01`: Extract shared `retire_queue`
108+
- `M12-T02`: Extract shared `wal`
109+
- `M12-T03`: Extract shared `wal_file`
110+
- `M12-T04`: Evaluate and, if still clean, extract shared `snapshot_file`
111+
112+
### M13
113+
114+
- `M13`: Define the internal engine authoring boundary after the first extractions
115+
- `M13-T01`: Write the internal runtime-vs-engine contract
116+
- `M13-T02`: Reassess whether a fourth-engine proof is still required or whether the extracted
117+
substrate already lowered authoring cost enough
118+
119+
### M14
120+
121+
- `M14`: Prove the extracted substrate lowers engine-authoring cost
122+
- `M14-T01`: Build one new engine or engine slice against the extracted substrate
123+
- `M14-T02`: Re-evaluate whether the repository can now honestly claim an internal DB-building
124+
library
125+
126+
## Execution Rules
127+
128+
- extract smallest-first
129+
- after each micro-extraction, stop and verify before continuing
130+
- if one extraction introduces awkward generic plumbing, stop and reassess rather than force the
131+
sequence
132+
- keep domain logic local even if runtime discipline is shared
133+
134+
## Current Recommendation
135+
136+
Do this next:
137+
138+
1. `M12-T01` shared `retire_queue`
139+
2. `M12-T02` shared `wal`
140+
3. `M12-T03` shared `wal_file`
141+
4. only then decide whether `snapshot_file` is still clean enough to extract
142+
143+
Do not do this next:
144+
145+
- public framework branding
146+
- generic state-machine APIs
147+
- generic snapshot schemas
148+
- extracting recovery entry points before the lower layers stabilize

docs/status.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# AllocDB Status
22
## Current State
3-
- Phase: replicated implementation with external Jepsen gate closed, M9 lease-kernel follow-on live-validated, M10 second-engine proof merged, and M11 third-engine proof merged
3+
- Phase: replicated implementation with external Jepsen gate closed, M9 lease-kernel follow-on live-validated, M10 second-engine proof merged, M11 third-engine proof merged, and M12 runtime-extraction roadmap staged
44
- Planning IDs: tasks use `M#-T#`; spikes use `M#-S#`
55
- Current milestone status:
66
- `M0` semantics freeze: complete enough for core work
@@ -16,6 +16,7 @@
1616
- `M9` generic lease-kernel follow-on: implementation merged on `main`
1717
- `M10` second-engine proof: merged on `main`; shared runtime extraction deferred
1818
- `M11` third-engine proof: merged on `main`; broad shared runtime still deferred, first micro-extraction now justified
19+
- `M12` first internal runtime extractions: planned
1920
- Latest completed implementation chunks:
2021
- `4156a80` `Bootstrap AllocDB core and docs`
2122
- `f84a641` `Add WAL file and snapshot recovery primitives`
@@ -212,9 +213,8 @@
212213
simulation coverage are now all in the mainline implementation
213214
- PR `#97` merged issue `#96`, extending Jepsen history generation and analysis for bundle
214215
reserve, revoke/reclaim, and stale-holder lease paths, then closing the loop with live KubeVirt
215-
`lease_safety-control` and full `1800s` `lease_safety-crash-restart` evidence on `allocdb-a`,
216-
both with `blockers=0`
216+
`lease_safety-control` and full `1800s` `lease_safety-crash-restart` evidence on `allocdb-a` with `blockers=0`
217217
- the next recommended step remains downstream real-cluster e2e work such as `gpu_control_plane`, not more unplanned lease-kernel semantics work; the current deployment slice covers a first in-cluster `StatefulSet` shape, but bootstrap-primary routing, failover/rejoin orchestration, and background maintenance remain operator work, and the current staging unblock path is to publish `skel84/allocdb` from GitHub Actions rather than relying on the local Docker engine
218-
- PR `#107` merged the `M10` quota-engine proof on `main`: `quota-core` now proves a second deterministic engine in-repo with bounded `CreateBucket` / `Debit`, logical-slot refill, and snapshot/WAL recovery; the `M10-T05` seam evaluation still concludes that shared runtime extraction is premature, with `retire_queue` the closest candidate and the rest still engine-local
219-
- PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: scaffold, deterministic hold lifecycle, logical-slot overdue expiry, and expiry/recovery proof are now in the mainline implementation
220-
- PR `#118` also closes the third-engine readout: `retire_queue` is now the first justified internal extraction candidate across all three engines, while a broad `dsm-runtime` or public DB-building library is still premature; `wal`, `wal_file`, and `snapshot_file` are the next likely internal seams only after that micro-extraction lands
218+
- PR `#107` merged the `M10` quota-engine proof on `main`, and PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: the repository now has a second and third deterministic engine with bounded command sets, logical-slot refill/expiry, and snapshot/WAL recovery proofs
219+
- the `M10-T05` and `M11-T05` readouts still defer broad shared-runtime extraction: `retire_queue` is the first justified internal extraction candidate, while `wal`, `wal_file`, and `snapshot_file` remain the next likely seams only after that micro-extraction lands
220+
- the next roadmap is now explicit in `runtime-extraction-roadmap.md`: start with `retire_queue`, then `wal`, then `wal_file`, and only then decide whether `snapshot_file` is still clean enough to extract before defining the internal authoring contract and asking for a fourth-engine or reduced-copy proof

0 commit comments

Comments
 (0)