|
| 1 | +# Reservation Engine Plan |
| 2 | + |
| 3 | +## Purpose |
| 4 | + |
| 5 | +This document defines the next engine experiment after `quota-core`. |
| 6 | + |
| 7 | +The goal is not to chase the biggest market first. The goal is to apply stronger pressure to the |
| 8 | + engine thesis: |
| 9 | + |
| 10 | +- can the current bounded, deterministic, replay-safe runtime discipline support a third engine |
| 11 | + with a different lifecycle shape |
| 12 | +- can that third engine expose cleaner shared-runtime seams than `quota-core` did |
| 13 | +- if the experiment succeeds, does building quota/credits on top of a later extracted runtime look |
| 14 | + materially easier |
| 15 | + |
| 16 | +The canonical v1 semantic boundary lives in |
| 17 | +[`reservation-semantics.md`](./reservation-semantics.md). |
| 18 | + |
| 19 | +## Decision |
| 20 | + |
| 21 | +The next engine experiment should be `reservation-core` in the existing workspace. |
| 22 | + |
| 23 | +The initial shape is: |
| 24 | + |
| 25 | +- keep the repository identity as `allocdb` |
| 26 | +- add a sibling crate such as `crates/reservation-core` |
| 27 | +- do not extract a shared runtime crate first |
| 28 | +- do not rebrand the repo as a framework yet |
| 29 | +- use this engine to test extraction pressure more aggressively than `quota-core` did |
| 30 | + |
| 31 | +## Why Reservation-Core Is The Right Next Experiment |
| 32 | + |
| 33 | +`quota-core` proved that a second deterministic engine can live in-repo. It did not prove that the |
| 34 | + shared runtime is ready to extract. |
| 35 | + |
| 36 | +`reservation-core` is the better next pressure test because it changes the semantic shape while |
| 37 | + still reusing the same core discipline: |
| 38 | + |
| 39 | +- active objects with terminal retirement |
| 40 | +- idempotent retry |
| 41 | +- deterministic expiry |
| 42 | +- snapshot plus WAL recovery |
| 43 | +- replay through the same apply path |
| 44 | +- bounded live state |
| 45 | + |
| 46 | +Compared with quota semantics, reservation semantics should stress: |
| 47 | + |
| 48 | +- expiry queues more directly |
| 49 | +- state transitions across multiple terminal outcomes |
| 50 | +- hold lifecycle invariants instead of balance arithmetic |
| 51 | +- recovery around confirm/release/expire transitions |
| 52 | + |
| 53 | +If the engine thesis is real, this experiment should make the shared-vs-domain boundary clearer. |
| 54 | + |
| 55 | +## Hypothesis |
| 56 | + |
| 57 | +If `reservation-core` lands cleanly, then a later extracted runtime should make quota/credits |
| 58 | + materially easier to build on top of it. |
| 59 | + |
| 60 | +What "easier" means here: |
| 61 | + |
| 62 | +- bounded storage, WAL, snapshot, replay, retirement, and recovery should already exist as shared |
| 63 | + substrate |
| 64 | +- quota should mainly reintroduce domain semantics such as balances, refill, and quota-specific |
| 65 | + result codes |
| 66 | +- the remaining hard work should be semantic, not runtime duplication |
| 67 | + |
| 68 | +If that does not happen, the library thesis is weaker than it currently appears. |
| 69 | + |
| 70 | +## Success Criteria |
| 71 | + |
| 72 | +The experiment is successful if all of the following become true: |
| 73 | + |
| 74 | +- `reservation-core` has one deterministic command surface with replay-equivalent results |
| 75 | +- the engine uses the same runtime discipline as the existing engines: |
| 76 | + - WAL as durable source of truth |
| 77 | + - snapshot plus WAL recovery |
| 78 | + - logical `request_slot` |
| 79 | + - bounded maps, queues, and operation dedupe state |
| 80 | + - fail-closed recovery |
| 81 | +- at least two runtime slices become obviously reusable across all three engines |
| 82 | +- the shared-vs-domain boundary becomes narrower and more defensible after the third engine |
| 83 | + |
| 84 | +The experiment is unsuccessful if any of the following happen: |
| 85 | + |
| 86 | +- reservation semantics require materially different durability or recovery contracts |
| 87 | +- copied runtime pieces diverge again and stay divergent |
| 88 | +- extraction pressure still points at only one tiny helper rather than a coherent substrate |
| 89 | +- the engine forces awkward generic abstractions into domain logic |
| 90 | + |
| 91 | +## Non-Goals |
| 92 | + |
| 93 | +The first reservation experiment should not try to do any of the following: |
| 94 | + |
| 95 | +- build a booking or commerce product |
| 96 | +- add payment, carts, pricing, or customer workflow logic |
| 97 | +- add background daemons or wall-clock timers |
| 98 | +- build cross-pool or multi-object transactions |
| 99 | +- extract a framework before the third-engine readout |
| 100 | +- promise a public database-building library yet |
| 101 | + |
| 102 | +## Initial Repository Shape |
| 103 | + |
| 104 | +The expected next layout is: |
| 105 | + |
| 106 | +- `crates/allocdb-core` |
| 107 | +- `crates/quota-core` |
| 108 | +- `crates/reservation-core` |
| 109 | +- `crates/allocdb-node` |
| 110 | + |
| 111 | +Possible later layout, only after the third-engine readout: |
| 112 | + |
| 113 | +- `crates/dsm-runtime` |
| 114 | +- `crates/allocdb-core` |
| 115 | +- `crates/quota-core` |
| 116 | +- `crates/reservation-core` |
| 117 | + |
| 118 | +## Phase 0: Freeze The Reservation Boundary |
| 119 | + |
| 120 | +Before code, freeze the experiment boundary in docs. |
| 121 | + |
| 122 | +Deliverables: |
| 123 | + |
| 124 | +- one design note for reservation semantics and non-goals |
| 125 | +- one explicit statement that the experiment stays in the current workspace |
| 126 | +- one explicit extraction hypothesis tied to a later quota/credits build |
| 127 | + |
| 128 | +Exit criteria: |
| 129 | + |
| 130 | +- the reservation experiment has a narrow v1 scope |
| 131 | +- the repo still does not promise a generic framework yet |
| 132 | + |
| 133 | +## Phase A: Add Reservation-Core As A Sibling Engine |
| 134 | + |
| 135 | +Start with the smallest viable scarce-capacity hold model. |
| 136 | + |
| 137 | +### First domain model |
| 138 | + |
| 139 | +- `PoolId` |
| 140 | +- `PoolRecord` |
| 141 | +- `HoldId` |
| 142 | +- `HoldRecord` |
| 143 | +- `OperationRecord` |
| 144 | + |
| 145 | +### First commands |
| 146 | + |
| 147 | +- `CreatePool` |
| 148 | +- `PlaceHold` |
| 149 | +- `ConfirmHold` |
| 150 | +- `ReleaseHold` |
| 151 | +- `ExpireHold` |
| 152 | + |
| 153 | +### First result codes |
| 154 | + |
| 155 | +- `Ok` |
| 156 | +- `AlreadyExists` |
| 157 | +- `PoolTableFull` |
| 158 | +- `PoolNotFound` |
| 159 | +- `HoldTableFull` |
| 160 | +- `HoldNotFound` |
| 161 | +- `InsufficientCapacity` |
| 162 | +- `HoldExpired` |
| 163 | +- `InvalidState` |
| 164 | +- `OperationConflict` |
| 165 | +- `OperationTableFull` |
| 166 | +- `SlotOverflow` |
| 167 | + |
| 168 | +### Rules |
| 169 | + |
| 170 | +- no wall-clock time reads |
| 171 | +- no background sweeper inside v1 |
| 172 | +- no multi-pool reservations |
| 173 | +- no partial confirms |
| 174 | +- no cross-command convenience APIs before the lifecycle is proven |
| 175 | + |
| 176 | +Exit criteria: |
| 177 | + |
| 178 | +- one pool can be created |
| 179 | +- one hold can be placed, confirmed, released, or expired deterministically |
| 180 | +- duplicate retry returns the cached outcome |
| 181 | +- conflicting `operation_id` reuse returns conflict |
| 182 | + |
| 183 | +## Phase B: Copy Runtime Pieces Locally, Without Extraction |
| 184 | + |
| 185 | +Copy the minimum runtime and persistence pieces needed by `reservation-core`. |
| 186 | + |
| 187 | +Candidate copied pieces: |
| 188 | + |
| 189 | +- WAL frame codec and scan discipline |
| 190 | +- snapshot write/load discipline |
| 191 | +- recovery monotonicity checks |
| 192 | +- bounded queue helpers |
| 193 | +- retire queue helpers |
| 194 | +- command context carrying `lsn` and `request_slot` |
| 195 | +- operation dedupe storage and retirement mechanics |
| 196 | + |
| 197 | +Rules: |
| 198 | + |
| 199 | +- copy first, do not extract first |
| 200 | +- keep naming close enough for later diffing |
| 201 | +- allow the copies to diverge while reservation semantics settle |
| 202 | + |
| 203 | +Exit criteria: |
| 204 | + |
| 205 | +- `reservation-core` can run its own live apply and replay loop with copied runtime pieces |
| 206 | +- no shared crate is introduced yet |
| 207 | + |
| 208 | +## Phase C: Prove Hold Lifecycle, Dedupe, And Replay |
| 209 | + |
| 210 | +This phase is the real third-engine test. |
| 211 | + |
| 212 | +Required invariants: |
| 213 | + |
| 214 | +- `0 <= held + consumed <= total_capacity` |
| 215 | +- a successful `PlaceHold(op, qty)` reserves capacity exactly once |
| 216 | +- `ConfirmHold` moves held capacity to consumed exactly once |
| 217 | +- `ReleaseHold` returns held capacity exactly once |
| 218 | +- same `operation_id` plus same payload returns the stored outcome |
| 219 | +- same `operation_id` plus different payload returns `OperationConflict` |
| 220 | +- replay from snapshot plus WAL produces the same pool and hold state as the live path |
| 221 | + |
| 222 | +Required tests: |
| 223 | + |
| 224 | +- duplicate retry after successful hold placement |
| 225 | +- duplicate retry after failed hold placement |
| 226 | +- duplicate confirm and duplicate release |
| 227 | +- conflicting retry with the same `operation_id` |
| 228 | +- deterministic invalid-state rejection |
| 229 | +- replay equivalence across snapshot and WAL restore |
| 230 | +- crash/restart around hold lifecycle boundaries |
| 231 | + |
| 232 | +Exit criteria: |
| 233 | + |
| 234 | +- lifecycle semantics are boring and deterministic |
| 235 | +- replay uses the same apply logic as the live path |
| 236 | + |
| 237 | +## Phase D: Add Deterministic Expiry With Logical Time Only |
| 238 | + |
| 239 | +Only after lifecycle semantics are stable should expiry be added. |
| 240 | + |
| 241 | +### Expiry rules |
| 242 | + |
| 243 | +Expiry must be a pure function of: |
| 244 | + |
| 245 | +- persisted hold state |
| 246 | +- persisted deadline slot |
| 247 | +- `request_slot` |
| 248 | + |
| 249 | +Expiry must not read: |
| 250 | + |
| 251 | +- wall-clock time |
| 252 | +- process time |
| 253 | +- external timers |
| 254 | + |
| 255 | +### Required tests |
| 256 | + |
| 257 | +- due-at-boundary expiry |
| 258 | +- large slot gaps |
| 259 | +- retry behavior near expiry boundaries |
| 260 | +- replay equivalence across expiry boundaries |
| 261 | +- crash/restart with overdue holds still pending expiry application |
| 262 | + |
| 263 | +Exit criteria: |
| 264 | + |
| 265 | +- expiry stays deterministic under replay |
| 266 | +- no wall-clock dependency enters the state machine |
| 267 | + |
| 268 | +## Phase E: Durability And Recovery Proof |
| 269 | + |
| 270 | +Reservation-core must inherit the same recovery rigor as the existing engines. |
| 271 | + |
| 272 | +Required proof points: |
| 273 | + |
| 274 | +- snapshot plus WAL recovery preserves hold states exactly |
| 275 | +- torn-tail WAL truncation never fabricates or drops confirmed state silently |
| 276 | +- rewound `request_slot` progress is rejected fail-closed |
| 277 | +- dedupe cache outcomes survive checkpoint and replay |
| 278 | +- expiry decisions after recovery match the live path |
| 279 | + |
| 280 | +Exit criteria: |
| 281 | + |
| 282 | +- recovery correctness is demonstrated, not inferred |
| 283 | +- there is no alternate apply path for restore |
| 284 | + |
| 285 | +## Phase F: Third-Engine Seam Evaluation |
| 286 | + |
| 287 | +After `reservation-core` is stable, perform a new seam readout. |
| 288 | + |
| 289 | +Questions to answer: |
| 290 | + |
| 291 | +- which modules are now copied nearly unchanged across all three engines |
| 292 | +- which modules still only look similar but differ in important ways |
| 293 | +- is there a small shared runtime worth extracting now |
| 294 | +- would building quota/credits on top of that shared runtime be materially simpler |
| 295 | + |
| 296 | +Required output: |
| 297 | + |
| 298 | +- one short seam-evaluation document |
| 299 | +- one clear decision: extract now, defer again, or abandon the library thesis |
| 300 | + |
| 301 | +## Recommended Issue Shape |
| 302 | + |
| 303 | +- `M11`: third-engine proof with `reservation-core` |
| 304 | +- `M11-T01`: freeze reservation boundary and v1 scope |
| 305 | +- `M11-T02`: scaffold `reservation-core` sibling crate with copied substrate |
| 306 | +- `M11-T03`: implement `CreatePool`, `PlaceHold`, `ConfirmHold`, and `ReleaseHold` |
| 307 | +- `M11-T04`: add logical-slot expiry and durability/recovery proof |
| 308 | +- `M11-T05`: evaluate shared runtime seams after the third engine |
| 309 | + |
| 310 | +## Stop Conditions |
| 311 | + |
| 312 | +Stop and reassess if any of the following happen: |
| 313 | + |
| 314 | +- `reservation-core` needs cross-object transactions to feel coherent |
| 315 | +- expiry cannot stay deterministic without introducing wall-clock coupling |
| 316 | +- the copied runtime diverges in ways that make future extraction less plausible |
| 317 | +- the experiment starts pulling the repo toward product workflow instead of engine truth |
| 318 | + |
| 319 | +## Current Recommendation |
| 320 | + |
| 321 | +Do this next if the goal is architecture truth and library extraction pressure. |
| 322 | + |
| 323 | +Do not do it next if the goal is immediate market pull or fastest path to a product wedge. |
0 commit comments