Skip to content

Commit b66733e

Browse files
committed
docs: define engine authoring boundary
1 parent 66de6e9 commit b66733e

File tree

3 files changed

+203
-1
lines changed

3 files changed

+203
-1
lines changed

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
## Engineering Docs
1818

1919
- [Architecture](./architecture.md)
20+
- [Engine Authoring Boundary](./engine-authoring-boundary.md)
2021
- [Fault Model](./fault-model.md)
2122
- [Jepsen Refactor Plan](./jepsen-refactor-plan.md)
2223
- [Lease Kernel Design Decisions](./lease-kernel-design.md)

docs/engine-authoring-boundary.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# Engine Authoring Boundary
2+
3+
## Purpose
4+
5+
This document closes the first `M13` question: after the `M12` micro-extractions, what is the
6+
stable boundary between the shared runtime substrate and engine-local semantics?
7+
8+
This is an internal authoring rule, not a public framework story.
9+
10+
## Decision
11+
12+
The shared runtime owns only mechanically shared durability and bounded-state substrate.
13+
14+
Everything that defines domain meaning stays engine-local.
15+
16+
That means:
17+
18+
- use the extracted runtime crates where the seam is already proven
19+
- keep new engine semantics local by default
20+
- treat any new generic abstraction above the current substrate as suspect until repeated pressure
21+
proves otherwise
22+
23+
## What The Shared Runtime Owns Today
24+
25+
The shared runtime currently owns only the modules that are already shared on `main`.
26+
27+
### Shared and stable enough now
28+
29+
- `allocdb-retire-queue`
30+
- bounded retirement queue discipline
31+
- no domain meaning beyond ordered retirement bookkeeping
32+
- `allocdb-wal-frame`
33+
- WAL frame versioning
34+
- frame header/footer validation
35+
- checksum verification
36+
- torn-tail and corruption detection at the frame level
37+
- `allocdb-wal-file`
38+
- append-only durable file handle
39+
- replace/rewrite discipline
40+
- truncation and reopen behavior
41+
42+
### What these modules are allowed to know
43+
44+
Only substrate concerns:
45+
46+
- bytes
47+
- lengths
48+
- checksums
49+
- file paths and file handles
50+
- bounded queue mechanics
51+
- ordering and truncation discipline
52+
53+
These modules must not know:
54+
55+
- command schemas
56+
- result codes
57+
- resource, bucket, pool, or hold semantics
58+
- snapshot schemas
59+
- engine-specific invariants
60+
61+
## What Stays Engine-Local
62+
63+
Each engine still owns the parts that define the database itself.
64+
65+
### Domain contract surface
66+
67+
Keep local:
68+
69+
- command enums
70+
- command codecs above raw frame transport
71+
- result codes and read models
72+
- config surfaces tied to domain semantics
73+
74+
### Persistence schema
75+
76+
Keep local:
77+
78+
- snapshot encoding and decoding
79+
- snapshot file wrappers while file formats still differ
80+
- engine-specific recovery error surfaces
81+
82+
### State machine semantics
83+
84+
Keep local:
85+
86+
- apply rules
87+
- invariants
88+
- derived indexes
89+
- logical-slot effects such as refill, expiry, revoke, reclaim, or fencing
90+
- any internal command semantics above raw WAL framing
91+
92+
### Recovery entry points
93+
94+
Keep local:
95+
96+
- top-level recovery APIs
97+
- replay orchestration that depends on engine-specific command decoding
98+
- operational logging tied to one engine's semantics
99+
100+
## Authoring Rules For Future Work
101+
102+
### Rule 1: Start local unless the seam is already proven
103+
104+
When adding a new engine or engine slice:
105+
106+
- use the shared runtime crates only for seams already extracted
107+
- keep new runtime-adjacent code local until at least two engines want the same thing in the same
108+
shape
109+
110+
### Rule 2: Do not generalize state-machine APIs
111+
112+
Do not introduce:
113+
114+
- generic state-machine traits
115+
- generic apply pipelines
116+
- generic snapshot schemas
117+
- generic recovery entry points
118+
119+
Those layers still carry domain meaning and would create abstraction debt faster than maintenance
120+
relief.
121+
122+
### Rule 3: Extract only below the semantic line
123+
124+
A module is a good runtime candidate only if it can stay below the line where domain meaning starts.
125+
126+
Good examples:
127+
128+
- bytes-on-disk framing
129+
- bounded retirement bookkeeping
130+
- file rewrite/truncate mechanics
131+
132+
Bad examples:
133+
134+
- "generic reserve/confirm/release" APIs
135+
- "generic bucket/pool/resource" models
136+
- "generic engine config" layers
137+
138+
### Rule 4: Prefer duplication over dishonest abstraction
139+
140+
If a candidate seam requires:
141+
142+
- engine-specific branches
143+
- feature flags that mirror engine names
144+
- generic types that only one engine can actually use
145+
146+
then it is not ready.
147+
148+
### Rule 5: New extractions need multi-engine pressure
149+
150+
Do not extract a new runtime module unless at least one of these is true:
151+
152+
- the code is already mechanically identical across engines
153+
- the same fix or improvement is landing independently in multiple engines
154+
- a new engine authoring pass clearly pays less copy-paste by using the shared layer
155+
156+
## Current Boundary Map
157+
158+
### Shared runtime
159+
160+
- `allocdb-retire-queue`
161+
- `allocdb-wal-frame`
162+
- `allocdb-wal-file`
163+
164+
### Deferred seams
165+
166+
- `snapshot_file`
167+
- deferred because the seam is still only clean inside the `quota-core` / `reservation-core`
168+
pair
169+
- bounded collections beyond `retire_queue`
170+
- still need proof that the common surface is stable enough
171+
- recovery helpers above file/frame mechanics
172+
- still too tied to engine-local replay contracts
173+
174+
### Explicit non-goals
175+
176+
- no public database-building library claim yet
177+
- no renaming the repository around framework identity
178+
- no generic engine kit above the current substrate
179+
180+
## Practical Consequence
181+
182+
A future engine author should think in this order:
183+
184+
1. write engine-local semantics first
185+
2. consume the existing shared runtime only for proven substrate
186+
3. copy new runtime-adjacent code locally if the seam is not already explicit
187+
4. extract later only if repeated pressure proves the boundary
188+
189+
That keeps the repository honest:
190+
191+
- shared where the code is actually shared
192+
- local where the semantics are still the database
193+
194+
## Next Step
195+
196+
With this boundary in place, the next `M13` step is narrower:
197+
198+
1. write the focused runtime-vs-engine contract note
199+
2. decide whether that contract already makes a reduced-copy proof likely enough
200+
3. only then choose whether `M14` still needs a full fourth-engine or can use a smaller engine
201+
slice proof

docs/status.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,4 +217,4 @@
217217
- the next recommended step remains downstream real-cluster e2e work such as `gpu_control_plane`, not more unplanned lease-kernel semantics work; the current deployment slice covers a first in-cluster `StatefulSet` shape, but bootstrap-primary routing, failover/rejoin orchestration, and background maintenance remain operator work, and the current staging unblock path is to publish `skel84/allocdb` from GitHub Actions rather than relying on the local Docker engine
218218
- PR `#107` merged the `M10` quota-engine proof on `main`, and PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: the repository now has a second and third deterministic engine with bounded command sets, logical-slot refill/expiry, and snapshot/WAL recovery proofs
219219
- PRs `#132`, `#133`, and `#134` merged the first `M12` runtime extractions on `main`: `retire_queue`, `wal`, and `wal_file` are now shared internal substrate instead of copied engine-local modules, while `M12-T04` closed as a defer decision because `snapshot_file` is still only a clean seam inside the `quota-core` / `reservation-core` pair and `allocdb-core` keeps the simpler file format
220-
- the next roadmap step is now `M13`: define the internal engine authoring boundary in `runtime-extraction-roadmap.md` and stop extraction pressure until that contract is written down
220+
- the next roadmap step is now `M13`: define the internal engine authoring boundary in `runtime-extraction-roadmap.md` and stop extraction pressure until that contract is written down; the authoring rule is to keep shared runtime below the semantic line and keep command surfaces, snapshot schemas, recovery entry points, and state-machine meaning engine-local

0 commit comments

Comments
 (0)