feat: implement reservation-core hold lifecycle#117
Conversation
Summary by CodeRabbit
WalkthroughImplements client-scoped operation lifecycle: dedupe, conflict detection, operation-table capacity checks, retire scheduling, and command dispatch/handlers (create/ place/ confirm/ release/ expire). Operation keys and snapshot encoding were changed to include Changes
Sequence DiagramsequenceDiagram
participant Client
participant OpTable as Operations Table
participant Dispatcher as Command Dispatcher
participant Handler as Command Handler
participant RetireQueue as Retirement Queue
participant Outcome as CommandOutcome
Client->>OpTable: Lookup (client_id, operation_id)
alt Cache hit & command matches
OpTable-->>Client: Return cached Outcome
else Cache hit & command mismatches
OpTable-->>Client: Return OperationConflict
else Cache hit but expired
OpTable->>OpTable: Evict entry
Client->>Dispatcher: Submit command
else Cache miss
Client->>Dispatcher: Submit command
end
Dispatcher->>Dispatcher: Enforce max_operations
alt Table full
Dispatcher-->>Client: Return OperationTableFull
else Capacity available
Dispatcher->>Handler: Route to handler (Create/Place/Confirm/Release/Expire)
Handler->>Handler: Validate & update pool/hold accounting
Handler-->>Dispatcher: Return ResultCode
Dispatcher->>OpTable: Insert OperationRecord (client_id, applied_lsn, retire_after_slot)
Dispatcher->>RetireQueue: Enqueue retirement keyed by (client_id, operation_id)
Dispatcher-->>Client: Return computed Outcome
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/reservation-core/src/state_machine.rs (1)
247-269: Successful lifecycle transitions needinfo!logs.The reject paths emit
warn!, but successfulCreatePool,PlaceHold,ConfirmHold,ReleaseHold, andExpireHoldtransitions are silent. These are the state changes operators will need when reconstructing reservation history or debugging capacity drift.As per coding guidelines,
**/*.rs: Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:infofor meaningful lifecycle and state-transition events;warnfor degraded but expected conditions such as rejected requests. Logging must be structured and purposeful.Also applies to: 271-331, 334-437
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/reservation-core/src/state_machine.rs` around lines 247 - 269, The successful lifecycle transition in apply_create_pool is missing an info-level structured log; add an info! call after inserting the PoolRecord (and similarly in the PlaceHold, ConfirmHold, ReleaseHold, and ExpireHold handlers) that logs the event name and key fields (e.g., pool_id via pool_id.get(), total_capacity, and any hold ids/capacity changes) so operators can reconstruct history; place the info! immediately after calling insert_pool (or after state mutation in the respective functions) and keep the reject warn! logs as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/reservation-core/src/state_machine.rs`:
- Around line 225-236: The PlaceHold branch in apply_command (Command::PlaceHold
-> apply_place_hold) currently drops context.request_slot and uses assert! on
request data, allowing quantity == 0 and accepting deadline_slot <= request_slot
which can panic or create already-expired holds; change apply_place_hold (and
any assertion-using helpers called from it) to validate inputs up-front: reject
quantity == 0 by returning a CommandOutcome error/result instead of panicking,
require deadline_slot > context.request_slot (or otherwise deterministically
treat holds with deadline <= request_slot as immediately expired and return the
appropriate CommandOutcome), and replace assert! checks with explicit error
branches that produce deterministic failure outcomes. Ensure you reference
context.request_slot when making the deadline comparison and return a
non-panicking outcome for invalid inputs.
- Around line 759-1080: Tests exercise live behavior but do not assert replay
equivalence after snapshot/restore; add one or more recovery tests that: build a
ReservationDb, feed the same sequence of commands via
ReservationDb::apply_client (use context(...) and request(...)) as in existing
tests (e.g., create pool, place/confirm/release/expire holds, and duplicate
retries), capture the live outcomes and db.snapshot(), then create a fresh
ReservationDb, restore it from the saved snapshot (or simulate restore + replay
of recorded operations), replay the recorded operations in the same order and
assert the replayed outcomes match the original outcomes and final snapshot
state matches the saved snapshot (check pools, holds, operations, and cached
outcome behavior/retirement windows); focus especially on commands that rely on
cached results (duplicate requests) and expiration/retirement behavior to ensure
deterministic replay.
---
Nitpick comments:
In `@crates/reservation-core/src/state_machine.rs`:
- Around line 247-269: The successful lifecycle transition in apply_create_pool
is missing an info-level structured log; add an info! call after inserting the
PoolRecord (and similarly in the PlaceHold, ConfirmHold, ReleaseHold, and
ExpireHold handlers) that logs the event name and key fields (e.g., pool_id via
pool_id.get(), total_capacity, and any hold ids/capacity changes) so operators
can reconstruct history; place the info! immediately after calling insert_pool
(or after state mutation in the respective functions) and keep the reject warn!
logs as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d7028925-028f-4ab7-bc63-9daa9b2aa6c1
📒 Files selected for processing (1)
crates/reservation-core/src/state_machine.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: semgrep-cloud-platform/scan
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/reservation-core/src/state_machine.rs
🔇 Additional comments (1)
crates/reservation-core/src/state_machine.rs (1)
159-178: Operation ID idempotency is correctly keyed globally byoperation_idper the protocol contract.The documentation (reservation-engine-plan.md, reservation-semantics.md) specifies idempotency semantics as "same
operation_id+ same payload returns cached result" and "sameoperation_id+ different payload returnsOperationConflict" without any client scoping. The current cache keying byOperationIdalone (line 159) correctly implements this global contract. Theclient_idfield inClientRequestis present but not used for deduplication, which aligns with the protocol design.> Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/reservation-core/src/state_machine.rs`:
- Around line 159-179: The idempotency lookup uses only request.operation_id
causing cross-client collisions; change the operations map to key by
(request.client_id, request.operation_id) (update the map type and all access
sites), modify lookup/removal/insert in apply_client (the block using
self.operations.get(request.operation_id), the removal call
self.operations.remove(request.operation_id), and the analogous code at the
later block mentioned) to use the tuple key, and add a two-client regression
test that reuses the same operation_id from different client_ids to assert
separate outcomes (cached result for same client, no cross-client cache or false
OperationConflict).
- Around line 297-302: The code currently returns
CommandOutcome::new(ResultCode::PoolNotFound) /
CommandOutcome::new(ResultCode::HoldNotFound) which clears the known ids; change
those returns to use the helper constructors that preserve the id context (e.g.
replace CommandOutcome::new(ResultCode::PoolNotFound) with
CommandOutcome::with_pool_not_found(pool_id, ResultCode::PoolNotFound) and
similarly use CommandOutcome::with_hold_not_found(hold_id,
ResultCode::HoldNotFound) where applicable), updating the branches around the
pools.get(...) failure and the hold lookups (the place_hold and related branches
referenced) so the returned CommandOutcome keeps the requested ids for
persistence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 54ed0587-f11c-4408-a1ea-5764d76f6ec2
📒 Files selected for processing (1)
crates/reservation-core/src/state_machine.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/reservation-core/src/state_machine.rs
🧠 Learnings (2)
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Applied to files:
crates/reservation-core/src/state_machine.rs
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.
Applied to files:
crates/reservation-core/src/state_machine.rs
🔇 Additional comments (1)
crates/reservation-core/src/state_machine.rs (1)
1141-1220: Extend replay coverage to the operation-retirement boundary.This test stops at slots 3-5, but the cached prefix operations only retire at slots 9 and 10 with the current config. A broken retire-queue rebuild or an off-by-one in retirement would still pass here. Please drive both live and restored DBs past that boundary and assert they stop returning cached outcomes on the same slot.
As per coding guidelines,
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/reservation-core/src/snapshot.rs (1)
13-14:⚠️ Potential issue | 🔴 CriticalSnapshot format changed without version bump.
Adding
client_id(16 bytes) to the operation record encoding changes the binary format, butVERSIONremains1. Old snapshots will fail to decode correctly (the decode function will read stale bytes asclient_id), and new snapshots won't be recognized as incompatible by old code.Bump
VERSIONto2and either reject old versions during decode or add migration logic to handle version 1 snapshots if backward compatibility is required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/reservation-core/src/snapshot.rs` around lines 13 - 14, The snapshot binary layout changed by adding client_id to the operation record but VERSION stayed at 1; update VERSION from 1 to 2 and update decoding logic in snapshot.rs: bump the const VERSION to 2, and in the decode/read routine (the function that checks MAGIC and VERSION and deserializes operation records) either explicitly reject snapshots with VERSION == 1 (returning an error) or implement migration logic to read the old format (without client_id) and populate client_id appropriately before continuing; ensure the decoder uses the VERSION constant and the MAGIC constant (MAGIC) to gate parsing so old/new formats are handled correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/reservation-core/src/snapshot.rs`:
- Around line 13-14: The snapshot binary layout changed by adding client_id to
the operation record but VERSION stayed at 1; update VERSION from 1 to 2 and
update decoding logic in snapshot.rs: bump the const VERSION to 2, and in the
decode/read routine (the function that checks MAGIC and VERSION and deserializes
operation records) either explicitly reject snapshots with VERSION == 1
(returning an error) or implement migration logic to read the old format
(without client_id) and populate client_id appropriately before continuing;
ensure the decoder uses the VERSION constant and the MAGIC constant (MAGIC) to
gate parsing so old/new formats are handled correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0a341425-bc34-4c30-8c2f-16427d2e7c44
📒 Files selected for processing (5)
crates/reservation-core/src/fixed_map.rscrates/reservation-core/src/ids.rscrates/reservation-core/src/snapshot.rscrates/reservation-core/src/snapshot_file.rscrates/reservation-core/src/state_machine.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: semgrep-cloud-platform/scan
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/reservation-core/src/fixed_map.rscrates/reservation-core/src/snapshot_file.rscrates/reservation-core/src/ids.rscrates/reservation-core/src/snapshot.rscrates/reservation-core/src/state_machine.rs
🧠 Learnings (2)
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Applied to files:
crates/reservation-core/src/state_machine.rs
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.
Applied to files:
crates/reservation-core/src/state_machine.rs
🔇 Additional comments (12)
crates/reservation-core/src/snapshot_file.rs (1)
189-236: LGTM!The test fixture correctly adds the
client_idfield to match the updatedOperationRecordstructure. The seeded snapshot now aligns with the client-scoped operation encoding insnapshot.rs.crates/reservation-core/src/fixed_map.rs (1)
27-33: LGTM!The
FixedKeyimplementation forClientOperationKeycorrectly combines both fields into a single hash. The XOR with rotation breaks symmetry betweenclient_idandoperation_id, and thesplitmix64finalization ensures good distribution even for sequential IDs.crates/reservation-core/src/ids.rs (1)
27-42: LGTM!The
ClientOperationKeystruct is well-designed as a composite key with appropriate derives for map usage. Theconst fn newconstructor with#[must_use]follows Rust idioms.crates/reservation-core/src/snapshot.rs (1)
538-553: LGTM!Test data correctly updated to include
client_idfield in the seededOperationRecord.crates/reservation-core/src/state_machine.rs (8)
6-6: LGTM!The structural changes correctly implement client-scoped operation storage.
OperationRecordnow includesclient_id, and both the operations map and retirement queue are keyed byClientOperationKey.Also applies to: 37-37, 84-86
159-214: LGTM!The
apply_clientlifecycle implementation is well-structured:
- Correctly handles expired cached results by removing and re-executing.
- Returns cached outcomes for idempotent retries.
- Detects and rejects operation ID conflicts.
- Enforces capacity limits before command execution.
- Properly records and schedules retirement for new operations.
229-255: LGTM!The command dispatcher cleanly routes to dedicated handlers and correctly passes
request_slotwhere needed for deadline validation.
281-350: LGTM!
apply_place_holdcorrectly:
- Validates
quantity > 0anddeadline_slot > request_slot(addresses past review comment).- Preserves
pool_idandhold_idin all outcome paths for correlation.- Uses
checked_subwith invariant-based expect for capacity calculation.
352-455: LGTM!The hold lifecycle handlers (
confirm,release,expire) correctly:
- Preserve
hold_idinHoldNotFoundoutcomes (addresses past review comment).- Handle state transitions with proper capacity accounting.
- Trigger expiry on late confirm attempts (
request_slot >= deadline_slot).- Reject early expire attempts (
request_slot < deadline_slot).
790-1283: LGTM!Excellent test coverage addressing all PR acceptance criteria:
- Idempotent retry semantics (success and failure paths).
- Operation conflict detection.
- Client-scoped operation isolation.
- Input validation for invalid place_hold parameters.
- Deterministic capacity limit rejection.
- Snapshot restore + replay equivalence.
This directly addresses the recovery test gap flagged in a previous review.
570-618: LGTM!The internal helper functions correctly encapsulate insert/replace operations with appropriate panic guards for invariant violations. These panics are justified as they indicate programming errors rather than recoverable runtime conditions.
638-657: LGTM!The
retire_statefunction correctly processes the retirement queue and includes awarnlog when encountering missing operations (which could indicate a bug or unexpected state), aligning with the coding guidelines for log levels.
Summary
Closes #112
Refs #109
Refs #113
Refs #114
Verification
cargo test -p reservation-corecargo clippy -p reservation-core --all-targets --all-features -- -D warnings