Skip to content

Conversation

@bosconi
Copy link
Member

@bosconi bosconi commented Jan 14, 2026

This change moves blob decoding from the main tokio runtime to the
isolated runtime, following the same pattern already used for encoding.

Previously, decode_batch_part_blob was synchronous and ran on the
calling runtime. When multiple large blob decodes occurred concurrently,
they could saturate runtime workers and prevent heartbeat tasks from
running, leading to persist reader lease expirations.

The fix pre-decodes hollow batch parts in BatchFetcher::fetch_leased_part
on the isolated runtime before creating FetchedBlob. This ensures the
heavy CPU work happens on separate OS threads where the scheduler can
context-switch, preserving liveness for heartbeat tasks.

Key changes:

  • decode_batch_part_blob is now async and spawns work on isolated runtime
  • FetchedBlobBuf::Hollow stores the pre-decoded EncodedPart instead of raw bytes
  • EncodedPart now implements Clone to support FetchedBlob Clone
  • isolated_runtime is threaded through ReadHandle, Consolidator, etc.
  • Various T: Sync bounds added where required by the async decode

…starvation

These tests demonstrate that CPU-bound decode work can block heartbeat tasks
when run on the main tokio runtime, but not when run on the isolated runtime.

The `decode_blocking_starves_heartbeat_task` test shows that synchronous
CPU-bound work on a single-threaded runtime delays heartbeats by 800ms+.

The `isolated_runtime_does_not_block_heartbeat` test shows that the same
work on the isolated runtime keeps heartbeat delays under 12ms.

This validates the approach of moving blob decoding to the isolated runtime
to prevent persist reader lease expirations.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@bosconi bosconi requested a review from a team as a code owner January 14, 2026 04:46
This change moves blob decoding from the main tokio runtime to the
isolated runtime, following the same pattern already used for encoding.

Previously, decode_batch_part_blob was synchronous and ran on the
calling runtime. When multiple large blob decodes occurred concurrently,
they could saturate runtime workers and prevent heartbeat tasks from
running, leading to persist reader lease expirations.

The fix pre-decodes hollow batch parts in BatchFetcher::fetch_leased_part
on the isolated runtime before creating FetchedBlob. This ensures the
heavy CPU work happens on separate OS threads where the scheduler can
context-switch, preserving liveness for heartbeat tasks.

Key changes:
- decode_batch_part_blob is now async and spawns work on isolated runtime
- FetchedBlobBuf::Hollow stores the pre-decoded EncodedPart instead of raw bytes
- EncodedPart now implements Clone to support FetchedBlob Clone
- isolated_runtime is threaded through ReadHandle, Consolidator, etc.
- Various T: Sync bounds added where required by the async decode

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@bosconi bosconi changed the title persist: add tests demonstrating isolated runtime prevents heartbeat starvation persist: use isolated runtime for decode to prevent heartbeat starvation Jan 14, 2026
@bosconi bosconi changed the title persist: use isolated runtime for decode to prevent heartbeat starvation persist: move blob decoding to isolated runtime to avoid tokio task deadlock Jan 14, 2026
@bosconi bosconi changed the title persist: move blob decoding to isolated runtime to avoid tokio task deadlock persist: move blob decoding to isolated runtime to avoid heartbeat starvation Jan 14, 2026
@bosconi bosconi requested a review from antiguru January 14, 2026 14:16
@bosconi
Copy link
Member Author

bosconi commented Jan 14, 2026

@antiguru code comments have been expanded to explain the motivation more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants