persist: move blob decoding to isolated runtime to avoid heartbeat starvation #34699

bosconi · 2026-01-14T04:46:10Z

This change moves blob decoding from the main tokio runtime to the
isolated runtime, following the same pattern already used for encoding.

Previously, decode_batch_part_blob was synchronous and ran on the
calling runtime. When multiple large blob decodes occurred concurrently,
they could saturate runtime workers and prevent heartbeat tasks from
running, leading to persist reader lease expirations.

The fix pre-decodes hollow batch parts in BatchFetcher::fetch_leased_part
on the isolated runtime before creating FetchedBlob. This ensures the
heavy CPU work happens on separate OS threads where the scheduler can
context-switch, preserving liveness for heartbeat tasks.

Key changes:

decode_batch_part_blob is now async and spawns work on isolated runtime
FetchedBlobBuf::Hollow stores the pre-decoded EncodedPart instead of raw bytes
EncodedPart now implements Clone to support FetchedBlob Clone
isolated_runtime is threaded through ReadHandle, Consolidator, etc.
Various T: Sync bounds added where required by the async decode

…starvation These tests demonstrate that CPU-bound decode work can block heartbeat tasks when run on the main tokio runtime, but not when run on the isolated runtime. The `decode_blocking_starves_heartbeat_task` test shows that synchronous CPU-bound work on a single-threaded runtime delays heartbeats by 800ms+. The `isolated_runtime_does_not_block_heartbeat` test shows that the same work on the isolated runtime keeps heartbeat delays under 12ms. This validates the approach of moving blob decoding to the isolated runtime to prevent persist reader lease expirations. Co-Authored-By: Claude Opus 4.5 <[email protected]>

This change moves blob decoding from the main tokio runtime to the isolated runtime, following the same pattern already used for encoding. Previously, decode_batch_part_blob was synchronous and ran on the calling runtime. When multiple large blob decodes occurred concurrently, they could saturate runtime workers and prevent heartbeat tasks from running, leading to persist reader lease expirations. The fix pre-decodes hollow batch parts in BatchFetcher::fetch_leased_part on the isolated runtime before creating FetchedBlob. This ensures the heavy CPU work happens on separate OS threads where the scheduler can context-switch, preserving liveness for heartbeat tasks. Key changes: - decode_batch_part_blob is now async and spawns work on isolated runtime - FetchedBlobBuf::Hollow stores the pre-decoded EncodedPart instead of raw bytes - EncodedPart now implements Clone to support FetchedBlob Clone - isolated_runtime is threaded through ReadHandle, Consolidator, etc. - Various T: Sync bounds added where required by the async decode Co-Authored-By: Claude Opus 4.5 <[email protected]>

src/persist-client/src/fetch.rs

bosconi · 2026-01-14T14:19:01Z

@antiguru code comments have been expanded to explain the motivation more.

bosconi requested a review from a team as a code owner January 14, 2026 04:46

antiguru reviewed Jan 14, 2026

View reviewed changes

src/persist-client/src/fetch.rs Show resolved Hide resolved

bosconi changed the title ~~persist: add tests demonstrating isolated runtime prevents heartbeat starvation~~ persist: use isolated runtime for decode to prevent heartbeat starvation Jan 14, 2026

bosconi changed the title ~~persist: use isolated runtime for decode to prevent heartbeat starvation~~ persist: move blob decoding to isolated runtime to avoid tokio task deadlock Jan 14, 2026

bosconi changed the title ~~persist: move blob decoding to isolated runtime to avoid tokio task deadlock~~ persist: move blob decoding to isolated runtime to avoid heartbeat starvation Jan 14, 2026

more justification in comments

ae5e929

bosconi requested a review from antiguru January 14, 2026 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

persist: move blob decoding to isolated runtime to avoid heartbeat starvation #34699

persist: move blob decoding to isolated runtime to avoid heartbeat starvation #34699

Uh oh!

bosconi commented Jan 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

bosconi commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

persist: move blob decoding to isolated runtime to avoid heartbeat starvation #34699

Are you sure you want to change the base?

persist: move blob decoding to isolated runtime to avoid heartbeat starvation #34699

Uh oh!

Conversation

bosconi commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bosconi commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bosconi commented Jan 14, 2026 •

edited

Loading