|
| 1 | +# Quota Runtime Seam Evaluation |
| 2 | + |
| 3 | +## Purpose |
| 4 | + |
| 5 | +This document closes `M10-T05` by evaluating whether `allocdb-core` and `quota-core` now justify a |
| 6 | +shared runtime extraction. |
| 7 | + |
| 8 | +The answer is based on the code as merged in `allocdb#107`, not on a framework-first plan. |
| 9 | + |
| 10 | +## Decision |
| 11 | + |
| 12 | +Do not extract a shared runtime crate yet. |
| 13 | + |
| 14 | +The second-engine proof succeeded, but the overlap is not yet stable enough to justify a |
| 15 | +`dsm-runtime` or similar crate on `main`. |
| 16 | + |
| 17 | +The correct outcome at this point is: |
| 18 | + |
| 19 | +- keep `allocdb-core` and `quota-core` as sibling engines in the same repository |
| 20 | +- keep copied runtime pieces local to each engine for now |
| 21 | +- record which seams look real |
| 22 | +- defer extraction until repeated maintenance pressure proves it is worth the churn |
| 23 | + |
| 24 | +## What The Second Engine Proved |
| 25 | + |
| 26 | +The engine thesis is now materially stronger than it was before `quota-core` existed. |
| 27 | + |
| 28 | +Both engines now demonstrate the same execution discipline: |
| 29 | + |
| 30 | +- bounded in-memory hot-path structures |
| 31 | +- WAL-backed durable ordering |
| 32 | +- snapshot plus WAL replay through the live apply path |
| 33 | +- logical `request_slot` |
| 34 | +- operation dedupe with bounded retirement |
| 35 | +- fail-closed recovery on corruption or monotonicity violations |
| 36 | + |
| 37 | +That is enough to say there is a real engine family here, not just one special-case lease kernel. |
| 38 | + |
| 39 | +## What Is Actually Shared |
| 40 | + |
| 41 | +### Clearly shared discipline |
| 42 | + |
| 43 | +The following ideas are genuinely common across both engines: |
| 44 | + |
| 45 | +- ordered frame append and replay |
| 46 | +- bounded probe-table storage |
| 47 | +- bounded retirement queues |
| 48 | +- snapshot plus WAL recovery orchestration |
| 49 | +- monotonic LSN and request-slot enforcement |
| 50 | +- deterministic retry semantics |
| 51 | + |
| 52 | +### Closest to mechanical extraction |
| 53 | + |
| 54 | +These modules are the closest to being extractable later with low semantic risk: |
| 55 | + |
| 56 | +- `retire_queue` |
| 57 | +- parts of `fixed_map` |
| 58 | +- parts of `wal` |
| 59 | +- parts of `wal_file` |
| 60 | + |
| 61 | +`retire_queue` is the strongest example. It is effectively the same data structure in both |
| 62 | +engines, differing only in the surrounding key types and local tests. |
| 63 | + |
| 64 | +`fixed_map`, `wal`, and `wal_file` do the same job in both engines, but they already diverge in |
| 65 | +details that matter: |
| 66 | + |
| 67 | +- `fixed_map` in `allocdb-core` carries richer trace logging and more key types |
| 68 | +- `wal` and `wal_file` are similar in shape, but their error surfaces and tests already differ |
| 69 | +- the extraction point would need to preserve boundedness and fail-closed behavior without |
| 70 | + introducing generic abstraction noise |
| 71 | + |
| 72 | +## What Is Not Shared Enough |
| 73 | + |
| 74 | +The following should remain engine-local. |
| 75 | + |
| 76 | +### Command and result surfaces |
| 77 | + |
| 78 | +Do not extract: |
| 79 | + |
| 80 | +- command enums |
| 81 | +- command codecs |
| 82 | +- result codes and domain outcomes |
| 83 | +- config types |
| 84 | + |
| 85 | +These are runtime-adjacent, but they are still domain contracts, not generic substrate. |
| 86 | + |
| 87 | +### Snapshot schema |
| 88 | + |
| 89 | +Do not extract snapshot encoding logic yet. |
| 90 | + |
| 91 | +The persistence discipline is shared, but the actual on-disk schema is not: |
| 92 | + |
| 93 | +- `allocdb-core` carries a richer allocator-specific snapshot layout |
| 94 | +- `quota-core` carries a much smaller bucket/operation layout |
| 95 | +- forcing a generic snapshot schema would either add indirection or erase useful domain structure |
| 96 | + |
| 97 | +The most that could be extracted later is helper machinery, not the schema itself. |
| 98 | + |
| 99 | +### Recovery API surface |
| 100 | + |
| 101 | +Do not extract recovery orchestration yet. |
| 102 | + |
| 103 | +The top-level recovery flow is recognizably similar, but the differences are already meaningful: |
| 104 | + |
| 105 | +- `allocdb-core` has richer replay error variants and slot-overflow reporting |
| 106 | +- `allocdb-core` has more operational logging |
| 107 | +- the restore path and replay details are still closely tied to engine-specific command decoding and |
| 108 | + state-machine APIs |
| 109 | + |
| 110 | +There may be a later helper seam here, but not a good generic crate boundary today. |
| 111 | + |
| 112 | +### State machine logic |
| 113 | + |
| 114 | +Do not extract any state-machine layer. |
| 115 | + |
| 116 | +The commonality is only at the discipline level: |
| 117 | + |
| 118 | +- deterministic apply |
| 119 | +- bounded state |
| 120 | +- retry cache |
| 121 | +- logical time |
| 122 | + |
| 123 | +The actual state transitions, invariants, and read models are completely different and should stay |
| 124 | +separate. |
| 125 | + |
| 126 | +## Why Extraction Is Premature Now |
| 127 | + |
| 128 | +Extraction would create cost immediately: |
| 129 | + |
| 130 | +- more crate boundaries |
| 131 | +- more generic traits and type plumbing |
| 132 | +- more public internal APIs to stabilize |
| 133 | +- more coordination every time one engine evolves |
| 134 | + |
| 135 | +But the benefit is still limited: |
| 136 | + |
| 137 | +- only one module is basically mechanical today |
| 138 | +- most other overlap is still “same shape, different details” |
| 139 | +- there is not yet repeated maintenance pain from fixing the same bug in both engines over time |
| 140 | + |
| 141 | +So the code is similar enough to reveal seams, but not similar enough to deserve a shared runtime |
| 142 | +crate yet. |
| 143 | + |
| 144 | +## Extraction Triggers Later |
| 145 | + |
| 146 | +Revisit extraction only when one of these becomes true: |
| 147 | + |
| 148 | +- the same runtime bug or improvement lands independently in both engines more than once |
| 149 | +- `fixed_map`, `wal`, or `wal_file` stay structurally stable across several follow-on slices |
| 150 | +- a third engine appears and wants the same substrate |
| 151 | +- the repo starts paying obvious maintenance cost for duplicated runtime fixes |
| 152 | + |
| 153 | +Until then, duplication is cheaper than premature abstraction. |
| 154 | + |
| 155 | +## Recommended Next Step |
| 156 | + |
| 157 | +Treat `M10` as complete. |
| 158 | + |
| 159 | +The next step is not more framework work. The next step is either: |
| 160 | + |
| 161 | +- stop here and keep both engines local while they stabilize, or |
| 162 | +- start a new domain/engine experiment only if there is a strong reason to test a third point in |
| 163 | + the design space |
| 164 | + |
| 165 | +If extraction is revisited later, start with the smallest possible mechanical move: |
| 166 | + |
| 167 | +1. `retire_queue` |
| 168 | +2. selected `fixed_map` helpers |
| 169 | +3. selected `wal` / `wal_file` helpers |
| 170 | + |
| 171 | +Do not start with snapshot schemas, command codecs, or state-machine traits. |
0 commit comments