perf: offload blocking RocksDB reads from tokio runtime by shuowang12 · Pull Request #3271 · MystenLabs/walrus

shuowang12 · 2026-04-08T21:35:48Z

Description

Describe the changes or additions included in this PR.

Test plan

How did you test the new or updated feature?

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)

Storage node:
Aggregator:
Publisher:
CLI:

shuowang12 · 2026-04-08T21:36:23Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ff7d9caa4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

crates/walrus-service/src/node.rs

…cking

…_in_background

clud-bot

Review: Offload blocking RocksDB reads from tokio runtime

The core async transformation is sound — the epoch-ordering bug in is_blob_registered is fixed (epoch is read after the blocking task completes), unwrap_or_resume_unwind correctly propagates panics, and all call sites are properly awaited. The create_storage_for_shards_locked fix is also correct — the old code had spawn_blocking(move || async move { ... }) which would return a future without executing it.

However, the PR is incomplete in its coverage: several RocksDB reads in async code paths remain synchronous.

Critical: RocksDB reads still on tokio threads

Location	Call	Priority
`prepare_sliver_for_storage` (node.rs:3097-3099)	`shard_storage.is_sliver_type_stored(...)` — sync `contains_key` on the sliver write path	major
`sliver_status` (node.rs:4601)	`shard_storage.is_sliver_stored::<A>(blob_id)` — sync read in async request handler	major
`recover_sliver` (blob_sync.rs:846)	`shard_storage.is_sliver_stored::<A>(&self.blob_id)?` — sync read in async recovery, runs concurrently	major
`is_stored_at_shard` (storage.rs:1141-1145)	`is_sliver_pair_stored(blob_id)?` — two sync `contains_key` calls, used in confirmation path via `is_stored_at_all_shards_at_latest_epoch`	major
`sync_blob_for_all_shards` (blob_sync.rs:403)	`node.storage.node_status()` — sync read	minor
`health_info` (node.rs:4576-4584)	`self.storage.node_status()` and `get_event_cursor_progress()` — admin endpoint, lower priority	minor

I'd recommend either wrapping these remaining call sites, or moving the offload boundary down into Storage/ShardStorage with async wrappers (e.g. is_sliver_stored_async) so future code can't accidentally regress.

Multiple sequential spawn_blocking hops

Some paths now do 2+ blocking hops for one logical operation:

retrieve_metadata: is_blob_registered (via validate_blob_access) then get_metadata
compute_storage_confirmation: is_blob_registered → get_per_object_info → get_per_object_pooled_info

Fine for now since each hop is IO-dominated, but batching into single blocking closures would reduce scheduling overhead on hot paths.

Cancellation safety

For short one-shot reads, the cancellation behavior is fine. For the longer handle_sync_shard_request batch fetch loop (storage.rs:1200-1230) and create_storage_for_shards_locked, if the parent future is dropped the blocking task keeps running and holds its resources (including StorageShardLock). Worth documenting that these are intentionally non-cancellable.

Nit

The new doc comment on is_blob_not_certified explaining the rayon context is good. The same reasoning applies to is_blob_certified which it delegates to — worth a similar note there.

…calling tokio from futures::executor

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

crates/walrus-service/src/node.rs Outdated Show resolved Hide resolved

shuowang12 force-pushed the shuo/spawn_blocking_db_access branch 2 times, most recently from 1c1abc5 to 8edc4fc Compare April 8, 2026 21:58

perf: offload blocking RocksDB reads from tokio runtime via spawn_blo…

7af01b9

…cking

shuowang12 force-pushed the shuo/spawn_blocking_db_access branch from 8edc4fc to 8b31d51 Compare April 8, 2026 23:25

shuowang12 added 3 commits April 8, 2026 16:35

use unwrap_or_resume_unwind for spawn_blocking panic handling

2f48091

remove async block inside spawn_blocking in create_storage_for_shards…

388bf1b

…_in_background

fmt

c9b44a9

shuowang12 force-pushed the shuo/spawn_blocking_db_access branch from 8b31d51 to c9b44a9 Compare April 8, 2026 23:36

clud-bot bot reviewed Apr 8, 2026

View reviewed changes

shuowang12 added 3 commits April 8, 2026 17:48

cover more

9b117da

guard is_stored_at_shard spawn_blocking with cfg(not(msim)) to avoid …

aaca245

…calling tokio from futures::executor

offload blocking RocksDB reads in blob event processor

79a43aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: offload blocking RocksDB reads from tokio runtime#3271

perf: offload blocking RocksDB reads from tokio runtime#3271
shuowang12 wants to merge 7 commits intomainfrom
shuo/spawn_blocking_db_access

shuowang12 commented Apr 8, 2026

Uh oh!

shuowang12 commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

clud-bot bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuowang12 commented Apr 8, 2026

Description

Test plan

Release notes

Uh oh!

shuowang12 commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

clud-bot bot left a comment

Choose a reason for hiding this comment

Review: Offload blocking RocksDB reads from tokio runtime

Critical: RocksDB reads still on tokio threads

Multiple sequential spawn_blocking hops

Cancellation safety

Nit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant