Skip to content

Commit d7fd66b

Browse files
authored
Merge pull request #115 from skel84/feat/reservation-core-plan
docs: add reservation-core third-engine plan
2 parents 37c29e1 + e9986c5 commit d7fd66b

File tree

4 files changed

+540
-5
lines changed

4 files changed

+540
-5
lines changed

docs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424
- [Quota Engine Plan](./quota-engine-plan.md)
2525
- [Quota Engine Semantics](./quota-semantics.md)
2626
- [Quota Runtime Seam Evaluation](./quota-runtime-seam-evaluation.md)
27+
- [Reservation Engine Plan](./reservation-engine-plan.md)
28+
- [Reservation Engine Semantics](./reservation-semantics.md)
2729
- [Revoke Safety Slice](./revoke-safety-slice.md)
2830
- [Operator Runbook](./operator-runbook.md)
2931
- [KubeVirt Jepsen Report](./kubevirt-jepsen-report.md)

docs/reservation-engine-plan.md

Lines changed: 323 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
# Reservation Engine Plan
2+
3+
## Purpose
4+
5+
This document defines the next engine experiment after `quota-core`.
6+
7+
The goal is not to chase the biggest market first. The goal is to apply stronger pressure to the
8+
engine thesis:
9+
10+
- can the current bounded, deterministic, replay-safe runtime discipline support a third engine
11+
with a different lifecycle shape
12+
- can that third engine expose cleaner shared-runtime seams than `quota-core` did
13+
- if the experiment succeeds, does building quota/credits on top of a later extracted runtime look
14+
materially easier
15+
16+
The canonical v1 semantic boundary lives in
17+
[`reservation-semantics.md`](./reservation-semantics.md).
18+
19+
## Decision
20+
21+
The next engine experiment should be `reservation-core` in the existing workspace.
22+
23+
The initial shape is:
24+
25+
- keep the repository identity as `allocdb`
26+
- add a sibling crate such as `crates/reservation-core`
27+
- do not extract a shared runtime crate first
28+
- do not rebrand the repo as a framework yet
29+
- use this engine to test extraction pressure more aggressively than `quota-core` did
30+
31+
## Why Reservation-Core Is The Right Next Experiment
32+
33+
`quota-core` proved that a second deterministic engine can live in-repo. It did not prove that the
34+
shared runtime is ready to extract.
35+
36+
`reservation-core` is the better next pressure test because it changes the semantic shape while
37+
still reusing the same core discipline:
38+
39+
- active objects with terminal retirement
40+
- idempotent retry
41+
- deterministic expiry
42+
- snapshot plus WAL recovery
43+
- replay through the same apply path
44+
- bounded live state
45+
46+
Compared with quota semantics, reservation semantics should stress:
47+
48+
- expiry queues more directly
49+
- state transitions across multiple terminal outcomes
50+
- hold lifecycle invariants instead of balance arithmetic
51+
- recovery around confirm/release/expire transitions
52+
53+
If the engine thesis is real, this experiment should make the shared-vs-domain boundary clearer.
54+
55+
## Hypothesis
56+
57+
If `reservation-core` lands cleanly, then a later extracted runtime should make quota/credits
58+
materially easier to build on top of it.
59+
60+
What "easier" means here:
61+
62+
- bounded storage, WAL, snapshot, replay, retirement, and recovery should already exist as shared
63+
substrate
64+
- quota should mainly reintroduce domain semantics such as balances, refill, and quota-specific
65+
result codes
66+
- the remaining hard work should be semantic, not runtime duplication
67+
68+
If that does not happen, the library thesis is weaker than it currently appears.
69+
70+
## Success Criteria
71+
72+
The experiment is successful if all of the following become true:
73+
74+
- `reservation-core` has one deterministic command surface with replay-equivalent results
75+
- the engine uses the same runtime discipline as the existing engines:
76+
- WAL as durable source of truth
77+
- snapshot plus WAL recovery
78+
- logical `request_slot`
79+
- bounded maps, queues, and operation dedupe state
80+
- fail-closed recovery
81+
- at least two runtime slices become obviously reusable across all three engines
82+
- the shared-vs-domain boundary becomes narrower and more defensible after the third engine
83+
84+
The experiment is unsuccessful if any of the following happen:
85+
86+
- reservation semantics require materially different durability or recovery contracts
87+
- copied runtime pieces diverge again and stay divergent
88+
- extraction pressure still points at only one tiny helper rather than a coherent substrate
89+
- the engine forces awkward generic abstractions into domain logic
90+
91+
## Non-Goals
92+
93+
The first reservation experiment should not try to do any of the following:
94+
95+
- build a booking or commerce product
96+
- add payment, carts, pricing, or customer workflow logic
97+
- add background daemons or wall-clock timers
98+
- build cross-pool or multi-object transactions
99+
- extract a framework before the third-engine readout
100+
- promise a public database-building library yet
101+
102+
## Initial Repository Shape
103+
104+
The expected next layout is:
105+
106+
- `crates/allocdb-core`
107+
- `crates/quota-core`
108+
- `crates/reservation-core`
109+
- `crates/allocdb-node`
110+
111+
Possible later layout, only after the third-engine readout:
112+
113+
- `crates/dsm-runtime`
114+
- `crates/allocdb-core`
115+
- `crates/quota-core`
116+
- `crates/reservation-core`
117+
118+
## Phase 0: Freeze The Reservation Boundary
119+
120+
Before code, freeze the experiment boundary in docs.
121+
122+
Deliverables:
123+
124+
- one design note for reservation semantics and non-goals
125+
- one explicit statement that the experiment stays in the current workspace
126+
- one explicit extraction hypothesis tied to a later quota/credits build
127+
128+
Exit criteria:
129+
130+
- the reservation experiment has a narrow v1 scope
131+
- the repo still does not promise a generic framework yet
132+
133+
## Phase A: Add Reservation-Core As A Sibling Engine
134+
135+
Start with the smallest viable scarce-capacity hold model.
136+
137+
### First domain model
138+
139+
- `PoolId`
140+
- `PoolRecord`
141+
- `HoldId`
142+
- `HoldRecord`
143+
- `OperationRecord`
144+
145+
### First commands
146+
147+
- `CreatePool`
148+
- `PlaceHold`
149+
- `ConfirmHold`
150+
- `ReleaseHold`
151+
- `ExpireHold`
152+
153+
### First result codes
154+
155+
- `Ok`
156+
- `AlreadyExists`
157+
- `PoolTableFull`
158+
- `PoolNotFound`
159+
- `HoldTableFull`
160+
- `HoldNotFound`
161+
- `InsufficientCapacity`
162+
- `HoldExpired`
163+
- `InvalidState`
164+
- `OperationConflict`
165+
- `OperationTableFull`
166+
- `SlotOverflow`
167+
168+
### Rules
169+
170+
- no wall-clock time reads
171+
- no background sweeper inside v1
172+
- no multi-pool reservations
173+
- no partial confirms
174+
- no cross-command convenience APIs before the lifecycle is proven
175+
176+
Exit criteria:
177+
178+
- one pool can be created
179+
- one hold can be placed, confirmed, released, or expired deterministically
180+
- duplicate retry returns the cached outcome
181+
- conflicting `operation_id` reuse returns conflict
182+
183+
## Phase B: Copy Runtime Pieces Locally, Without Extraction
184+
185+
Copy the minimum runtime and persistence pieces needed by `reservation-core`.
186+
187+
Candidate copied pieces:
188+
189+
- WAL frame codec and scan discipline
190+
- snapshot write/load discipline
191+
- recovery monotonicity checks
192+
- bounded queue helpers
193+
- retire queue helpers
194+
- command context carrying `lsn` and `request_slot`
195+
- operation dedupe storage and retirement mechanics
196+
197+
Rules:
198+
199+
- copy first, do not extract first
200+
- keep naming close enough for later diffing
201+
- allow the copies to diverge while reservation semantics settle
202+
203+
Exit criteria:
204+
205+
- `reservation-core` can run its own live apply and replay loop with copied runtime pieces
206+
- no shared crate is introduced yet
207+
208+
## Phase C: Prove Hold Lifecycle, Dedupe, And Replay
209+
210+
This phase is the real third-engine test.
211+
212+
Required invariants:
213+
214+
- `0 <= held + consumed <= total_capacity`
215+
- a successful `PlaceHold(op, qty)` reserves capacity exactly once
216+
- `ConfirmHold` moves held capacity to consumed exactly once
217+
- `ReleaseHold` returns held capacity exactly once
218+
- same `operation_id` plus same payload returns the stored outcome
219+
- same `operation_id` plus different payload returns `OperationConflict`
220+
- replay from snapshot plus WAL produces the same pool and hold state as the live path
221+
222+
Required tests:
223+
224+
- duplicate retry after successful hold placement
225+
- duplicate retry after failed hold placement
226+
- duplicate confirm and duplicate release
227+
- conflicting retry with the same `operation_id`
228+
- deterministic invalid-state rejection
229+
- replay equivalence across snapshot and WAL restore
230+
- crash/restart around hold lifecycle boundaries
231+
232+
Exit criteria:
233+
234+
- lifecycle semantics are boring and deterministic
235+
- replay uses the same apply logic as the live path
236+
237+
## Phase D: Add Deterministic Expiry With Logical Time Only
238+
239+
Only after lifecycle semantics are stable should expiry be added.
240+
241+
### Expiry rules
242+
243+
Expiry must be a pure function of:
244+
245+
- persisted hold state
246+
- persisted deadline slot
247+
- `request_slot`
248+
249+
Expiry must not read:
250+
251+
- wall-clock time
252+
- process time
253+
- external timers
254+
255+
### Required tests
256+
257+
- due-at-boundary expiry
258+
- large slot gaps
259+
- retry behavior near expiry boundaries
260+
- replay equivalence across expiry boundaries
261+
- crash/restart with overdue holds still pending expiry application
262+
263+
Exit criteria:
264+
265+
- expiry stays deterministic under replay
266+
- no wall-clock dependency enters the state machine
267+
268+
## Phase E: Durability And Recovery Proof
269+
270+
Reservation-core must inherit the same recovery rigor as the existing engines.
271+
272+
Required proof points:
273+
274+
- snapshot plus WAL recovery preserves hold states exactly
275+
- torn-tail WAL truncation never fabricates or drops confirmed state silently
276+
- rewound `request_slot` progress is rejected fail-closed
277+
- dedupe cache outcomes survive checkpoint and replay
278+
- expiry decisions after recovery match the live path
279+
280+
Exit criteria:
281+
282+
- recovery correctness is demonstrated, not inferred
283+
- there is no alternate apply path for restore
284+
285+
## Phase F: Third-Engine Seam Evaluation
286+
287+
After `reservation-core` is stable, perform a new seam readout.
288+
289+
Questions to answer:
290+
291+
- which modules are now copied nearly unchanged across all three engines
292+
- which modules still only look similar but differ in important ways
293+
- is there a small shared runtime worth extracting now
294+
- would building quota/credits on top of that shared runtime be materially simpler
295+
296+
Required output:
297+
298+
- one short seam-evaluation document
299+
- one clear decision: extract now, defer again, or abandon the library thesis
300+
301+
## Recommended Issue Shape
302+
303+
- `M11`: third-engine proof with `reservation-core`
304+
- `M11-T01`: freeze reservation boundary and v1 scope
305+
- `M11-T02`: scaffold `reservation-core` sibling crate with copied substrate
306+
- `M11-T03`: implement `CreatePool`, `PlaceHold`, `ConfirmHold`, and `ReleaseHold`
307+
- `M11-T04`: add logical-slot expiry and durability/recovery proof
308+
- `M11-T05`: evaluate shared runtime seams after the third engine
309+
310+
## Stop Conditions
311+
312+
Stop and reassess if any of the following happen:
313+
314+
- `reservation-core` needs cross-object transactions to feel coherent
315+
- expiry cannot stay deterministic without introducing wall-clock coupling
316+
- the copied runtime diverges in ways that make future extraction less plausible
317+
- the experiment starts pulling the repo toward product workflow instead of engine truth
318+
319+
## Current Recommendation
320+
321+
Do this next if the goal is architecture truth and library extraction pressure.
322+
323+
Do not do it next if the goal is immediate market pull or fastest path to a product wedge.

0 commit comments

Comments
 (0)