This document defines the approved follow-on data model, command semantics, consistency model, and product-level API rules for AllocDB's generic lease kernel.
For the transport-neutral request and response surface, see api.md.
The current implementation still exposes a reservation-centric alpha compatibility path. That path
is expected to converge toward the lease-centric model defined here as M9 lands.
Related planning docs:
A resource is the allocatable object identified by resource_id.
Examples:
seat_21Agpu_node_14hotel_room_204inventory_unit_8932
In the trusted core, a resource has only allocatable state. Arbitrary metadata is out of scope.
Suggested record:
resource_id : u128
current_state : enum { available, reserved, active, revoking }
current_lease_id : u128 | 0
version : u64
version increments on every resource state transition. It is a read-side observation field, not
a write precondition.
A lease is the authoritative ownership object for one scarce-resource claim by one holder.
A lease may cover:
- one resource
- or one all-or-nothing bundle of multiple resources
Suggested record:
lease_id : u128
holder_id : u128
state : enum { reserved, active, revoking, released, expired, revoked }
lease_epoch : u64
created_lsn : u64
deadline_slot : u64 | 0
released_lsn : u64 | 0
retire_after_slot : u64 | 0
member_count : u32
Rules:
- lease state is separate from resource state
released,expired, andrevokedare terminal lease-history states, not live resource statesrevokingwithdraws holder authority but does not yet make the resources reusable- terminal lease history is retained only until
retire_after_slot
A lease member ties one resource_id to one parent lease_id.
Suggested record:
lease_id : u128
resource_id : u128
member_index : u32
Rules:
- member records exist only to describe the resource set owned by one lease
- a member has no independent holder, lifecycle, or public identity
- member history retires with the parent lease
Every client-visible write carries an operation_id.
Suggested record:
operation_id : u128
command_hash : u128
result_code : enum
result_lease_id : u128 | 0
result_deadline_slot : u64 | 0
applied_lsn : u64
retire_after_slot : u64
The operation table provides bounded idempotency, not permanent dedupe.
The trusted core stores operation outcomes so duplicate operation_id retries can return the
original result while the dedupe window is still open.
resource_id is externally supplied and stable.
lease_id is derived from committed log position:
lease_id = (shard_id << 64) | lsn
For the single-node deployment, shard_id is always 0.
operation_id is client supplied and must be unique within the configured dedupe window.
- A resource has at most one active lease authority at a time.
- A successful bundle commit becomes visible atomically or not at all.
- Every active lease member belongs to exactly one active lease.
activeandrevokingleases cannot be expired.- A terminal lease never becomes active again.
- A successful reserve command creates exactly one lease object.
- Every holder-authorized mutation is checked against the current
(lease_id, lease_epoch). - Revoke may withdraw authority immediately, but reuse may happen only after explicit safe reclaim.
- Every committed command has exactly one log position and one replay result.
- Replaying the same snapshot plus WAL yields the same state and the same command outcomes.
- Expiration or reclaim can make a resource reusable late, but never early.
- Reads that claim strict consistency must observe a specific applied LSN.
- Bounded retention rules must be sufficient to preserve the advertised API semantics.
TTL is represented in logical slots, not wall-clock timestamps, inside the trusted core.
External APIs may accept duration-style input, but the WAL and state machine operate on:
ttl_slotsdeadline_slot
TTL policy:
- product-level maximum reserve TTL is
1hour - deployments may configure a lower maximum
- TTL applies to the
reservedpre-activation state - longer-lived ownership should use
activeleases plus explicit revoke and reclaim, not ever-longer timer-driven reuse inside the state machine
All client-visible state-changing commands use the same outer envelope.
operation_id : u128
client_id : u128
request_slot : u64
command_kind : enum
payload ...
The command signatures below show payload fields only.
The approved follow-on result-code set is:
oknoopalready_existsresource_table_fullresource_not_foundresource_busybundle_too_largettl_out_of_rangelease_table_fulllease_member_table_fulllease_not_foundlease_retiredexpiration_index_fulloperation_table_fulloperation_conflictinvalid_stateholder_mismatchstale_epochslot_overflow
operation_table_full is a pre-commit failure: the allocator cannot accept a new deduped client
command because it has no bounded space to remember the outcome.
slot_overflow is the trusted-core guard result when a derived deadline or retirement slot would
exceed u64::MAX. The single-node engine rejects the same condition before WAL commit as a
definite submission failure.
create_resource(resource_id)
Success effect:
- inserts a new resource in
available
Failure cases:
already_existsresource_table_full
reserve_bundle(resource_ids[], holder_id, ttl_slots)
Preconditions:
resource_idsis non-emptyresource_ids.len()is within the configured maximum bundle size- every
resource_idis unique within the command - every resource exists
- every resource state is
available ttl_slotsis within configured bounds
Success effect:
- derive
lease_idfrom the committedlsn - compute
deadline_slot = request_slot + ttl_slots - create a lease with
state = reservedandlease_epoch = 1 - create one bounded member record for each resource
- attach all member resources to the new lease atomically
Failure cases:
resource_not_foundresource_busybundle_too_largettl_out_of_rangelease_table_fulllease_member_table_fullexpiration_index_full
Compatibility rule:
- the current single-resource
reserve(resource_id, holder_id, ttl_slots)command is the compatibility form ofreserve_bundle([resource_id], holder_id, ttl_slots)
activate(lease_id, holder_id, lease_epoch)
Preconditions:
- lease exists
- lease state is
reserved holder_idmatcheslease_epochmatches the current live lease record
Success effect:
- lease state becomes
active - every member resource state becomes
active
Failure cases:
lease_not_foundlease_retiredinvalid_stateholder_mismatchstale_epoch
Compatibility rule:
- the current single-resource
confirm(reservation_id, holder_id)command is a compatibility form ofactivate(lease_id, holder_id, lease_epoch)for a bundle of size1
release(lease_id, holder_id, lease_epoch)
Preconditions:
- lease exists
- lease state is
reservedoractive holder_idmatcheslease_epochmatches the current live lease record
Success effect:
- lease state becomes
released lease_epochincrements to invalidate further holder authority- all member resources return to
available
Failure cases:
lease_not_foundlease_retiredinvalid_stateholder_mismatchstale_epoch
Compatibility rule:
- the current single-resource
release(reservation_id, holder_id)command is a compatibility form ofrelease(lease_id, holder_id, lease_epoch)for a bundle of size1
revoke(lease_id)
Preconditions:
- lease exists
- lease state is
active
Success effect:
- lease state becomes
revoking lease_epochincrements- member resources remain unavailable for reuse
Failure cases:
lease_not_foundlease_retiredinvalid_state
reclaim(lease_id)
Preconditions:
- lease exists
- lease state is
revoking
Success effect:
- lease state becomes
revoked - all member resources return to
available
Failure cases:
lease_not_foundlease_retiredinvalid_state
expire(lease_id, deadline_slot)
This is an internal command written through the same WAL path as external commands.
Preconditions:
- lease exists
- lease state is
reserved deadline_slotmatches the lease record
Success effect:
- lease state becomes
expired lease_epochincrements to invalidate any future stale holder use- all member resources return to
available
Deterministic no-op cases:
- lease already
active - lease already
revoking - lease already
released - lease already
expired - lease already
revoked - lease does not match the expected deadline
get_resource(resource_id)
get_lease(lease_id)
Rules:
- strict reads are tied to an applied LSN
- reserved leases created by
reserve_bundle, along with active and revoking leases, are always queryable - terminal leases remain queryable until
retire_after_slot - after the live record retires, reads return
lease_retiredvia bounded retained metadata - that retained metadata is conservative: once full history is dropped, older shard-local
lease_idvalues at or below the retired watermark also read aslease_retired
Compatibility rule:
- the current
get_reservation(reservation_id)read is superseded byget_lease(lease_id)and remains only as a compatibility surface during the implementation transition
The approved lease model supersedes the older reservation-centric naming.
During the implementation transition:
reserve(resource_id, holder_id, ttl_slots)remains as the compatibility form of one-memberreserve_bundleconfirm(reservation_id, holder_id)remains as the compatibility form of one-memberactivaterelease(reservation_id, holder_id)remains as the compatibility form of one-memberreleaseget_reservation(reservation_id)remains as the compatibility form ofget_lease(lease_id)
The semantic rule is that these compatibility commands must not diverge from the lease model.
The current version targets strict serializable behavior on a single shard with one executor.
All state changes are applied in one deterministic order.
same snapshot + same WAL -> same state + same command results
"Exactly once" in AllocDB means:
- the same
operation_idwith the same command returns the original result - the same
operation_idwith different command contents returnsoperation_conflict - the guarantee holds within a configured retention window
W
Infinite dedupe is not a goal because it conflicts with bounded storage.
If the operation table itself is full, the allocator returns operation_table_full and does not
accept the command into the deduped execution path.
The transport-level outcome of a write is not always the same as the state-machine outcome.
Clients must distinguish:
- definite success: the command committed and the result is known
- definite failure: the command was rejected before commit and did not take effect
- indefinite outcome: the client cannot tell whether the command committed
Indefinite outcomes are expected under timeout, disconnect, process crash, or reply loss.
The rule is:
- clients resolve indefinite outcomes by retrying the same
operation_idwithin the dedupe window - the server returns the original result if that operation already committed
- the server never invents a fresh second execution for the same
operation_id - the server does not promise to resolve ambiguity after dedupe retention has expired
Current single-node engine rule:
- malformed request, payload-too-large, overload, and slot-overflow errors are definite pre-commit failures
lsn_exhaustedis a definite write rejection once the engine has committedu64::MAXand no further LSN can be assigned- WAL write failure halts the live engine and is an indefinite submission failure
- after a WAL-path failure, the live engine refuses further writes with
engine_halted - clients resolve that ambiguity by recovering the node, then retrying the same
operation_id - if the failed attempt reached durable WAL, the retry returns the original stored result
- if the failed attempt did not reach durable WAL, the retry executes once as a fresh command
engine_haltedremains an indefinite submission failure because the halted node refuses to claim whether a prior ambiguous write committed- while the engine is halted, live reads also fail closed because in-memory state may lag durable WAL until recovery rebuilds the node
- the retry contract only holds while the dedupe retention window is still open
- the future wire protocol must preserve that distinction explicitly instead of flattening all submission errors into one generic failure class
This is the practical meaning of reliable submission. It keeps command handling bounded without pretending that transport failures do not exist.
The following rules are fixed:
- holder-authorized commands carry both
holder_idand the currentlease_epoch - lease authority is keyed by
lease_id, notresource.version - the resource
versionfield is observable but is not a write guard - lease lookup history is bounded and returns
lease_retiredafter retirement - resource metadata is excluded from the trusted core
- indefinite write outcomes are resolved by client retry with the same
operation_id - liveness observation stays outside the trusted core; the core consumes explicit
revokeandreclaimcommands instead of wall-clock-derived lease timeouts