[ISSUE #6506]♻️Refactor consume message concurrency handling to improve task management and shutdown process by mxsm · Pull Request #6507 · mxsm/rocketmq-rust

mxsm · 2026-02-24T13:15:06Z

Which Issue(s) This PR Fixes(Closes)

Fixes [Refactor♻️] Refactor consume message concurrency handling to improve task management and shutdown process #6506

Brief Description

How Did You Test This Change?

Summary by CodeRabbit

Release Notes

Refactor
- Improved message consumption concurrency control with semaphore-based limits and graceful shutdown coordination.
- Enhanced error logging for broadcast failures and message consumption issues.
Chores
- Increased compiler recursion limit in multiple modules to support complex macro expansions.

…ve task management and shutdown process

rocketmq-rust-bot · 2026-02-24T13:15:16Z

🔊@mxsm 🚀Thanks for your contribution🎉！

💡CodeRabbit(AI) will review your code first🔥！

Note

🚨The code review suggestions from CodeRabbit are to be used as a reference only, and the PR submitter can decide whether to make changes based on their own judgment. Ultimately, the project management personnel will conduct the final code review💥.

coderabbitai · 2026-02-24T13:15:29Z

Walkthrough

The changes refactor concurrent message consumption handling by replacing a runtime-backed spawning mechanism with semaphore-based concurrency control for improved task management and graceful shutdown. Additionally, crate-level recursion limits are increased across multiple files to support deeper macro expansion during compilation.

Changes

Cohort / File(s)	Summary
Recursion Limit Attributes `rocketmq-client/benches/concurrent_optimization_benchmark.rs`, `rocketmq-client/tests/integration_tests.rs`, `rocketmq-tools/rocketmq-admin/rocketmq-admin-core/examples/admin_builder_pattern.rs`, `rocketmq-tools/rocketmq-admin/rocketmq-admin-core/src/lib.rs`	Added crate-level `#![recursion_limit = "256"]` attribute to increase allowed macro recursion depth during compilation.
Concurrent Message Consumption Refactoring `rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`	Replaced RocketMQRuntime-based spawning with semaphore-backed concurrency control; introduced `consume_semaphore` and `shutdown_token` for backpressure and graceful shutdown; refactored spawn points to use `tokio::spawn` with permit acquisition; enhanced error handling and hook execution gating; updated clustering backoff logic and message processing with blocking/async split.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

refactor♻️

Suggested reviewers

SpaceXCN
TeslaRustor
rocketmq-rust-bot

Poem

🐰✨ With semaphores spun and tokens held tight,
Concurrency bounded, shutdown runs right,
The rabbit hops through recursive compile,
Old runtimes fade—new grace all the while! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check	❓ Inconclusive	Recursion limit increases in four files appear tangentially related to the refactoring but are not core to the stated objective of improving concurrency handling and shutdown.	Clarify whether recursion limit adjustments are necessary dependencies for the refactoring or should be addressed in a separate PR focusing on compilation fixes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main refactoring work: replacing RocketMQRuntime with semaphore-based concurrency control and improving shutdown semantics.
Linked Issues check	✅ Passed	The linked issue `#6506` provides only a high-level refactoring objective without specific requirements. The PR implements concurrency handling improvements (semaphore-based control, graceful shutdown, backpressure) that align with the stated goal.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor-6506

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs (3)
187-199: submit_consume_request_later doesn't check the cancellation token, but the callee does — acceptable.

The delayed retry task will be harmlessly rejected by submit_consume_request at line 319 when the shutdown token is already cancelled. No issue here, but be aware these ghost tasks can fire after shutdown() returns (since shutdown doesn't await them). If strict post-shutdown silence is needed, you could select on the token inside the delay:
Optional: cancel delayed retries on shutdown
 fn submit_consume_request_later(
     &self,
     msgs: Vec<ArcMut<MessageExt>>,
     this: ArcMut<Self>,
     process_queue: Arc<ProcessQueue>,
     message_queue: MessageQueue,
 ) {
+    let token = self.shutdown_token.clone();
     tokio::spawn(async move {
-        tokio::time::sleep(Duration::from_secs(5)).await;
-        this.submit_consume_request(this.clone(), msgs, process_queue, message_queue, true)
-            .await;
+        tokio::select! {
+            _ = tokio::time::sleep(Duration::from_secs(5)) => {
+                this.submit_consume_request(this.clone(), msgs, process_queue, message_queue, true)
+                    .await;
+            }
+            _ = token.cancelled() => {}
+        }
     });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`
around lines 187 - 199, The delayed-retry helper submit_consume_request_later
currently unconditionally sleeps then calls submit_consume_request, which can
run after shutdown; to cancel those ghost tasks earlier, change
submit_consume_request_later to await either the delay or the shutdown
cancellation and only call submit_consume_request when the delay completed (use
tokio::select! between tokio::time::sleep(Duration::from_secs(5)) and the
shutdown token's cancelled() future), keeping the call to
submit_consume_request(this.clone(), msgs, process_queue, message_queue, true).
Ensure you reference the same shutdown cancellation token used by
submit_consume_request so the delayed task observes shutdown and returns without
calling submit_consume_request if cancelled.
357-394: Permit lifecycle is well-managed; consider the retry accumulation scenario.

The _permit binding correctly ties the semaphore permit to the task's lifetime, ensuring release on both normal completion and panics.

One edge-case to consider: under sustained backpressure, each saturation event spawns a new delayed retry via submit_consume_request_later, and if those retries also hit saturation, the cycle compounds — potentially accumulating many sleeping tokio::spawn tasks. This matches the Java SDK's behavior, but you might want to monitor the retry queue depth in production.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`
around lines 357 - 394, spawn_consume_task currently calls
submit_consume_request_later on semaphore saturation which can spawn many
sleeping retry tasks under sustained backpressure; change the retry strategy so
submit_consume_request_later enqueues the (msgs, this, process_queue,
message_queue) into a single bounded retry queue (e.g., an mpsc::channel) and
have one background retry worker task drain that queue and retry after the 5s
delay (or apply backoff), rather than spawning a new tokio::spawn per saturation
event; update submit_consume_request_later and add a retry worker initialised by
ConsumeMessageConcurrentlyService to avoid unbounded sleeping tasks.
369-383: Note: spawn_blocking in ConsumeRequest::run uses Tokio's blocking pool, not the semaphore-bounded pool.

The semaphore correctly limits the number of concurrent logical consume tasks, but each task's actual blocking work (listener.consume_message at line 464) runs on Tokio's default blocking thread pool (up to 512 threads). If consume_thread_max is much smaller than 512, the semaphore is the effective bottleneck — which is the intent. Just be aware that under extreme load, the blocking pool configuration (max_blocking_threads) may also need tuning independently of consume_thread_max.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`
around lines 369 - 383, The current code acquires a semaphore permit via
try_acquire_owned and then tokio::spawn's an async task, but ConsumeRequest::run
itself uses tokio::task::spawn_blocking for listener.consume_message, so the
semaphore does not bound the runtime's blocking pool; either move the blocking
execution under the semaphore permit or avoid an inner spawn_blocking: refactor
ConsumeRequest::run so the blocking listener.consume_message is executed while
the permit is held (e.g., perform the blocking call directly or have the outer
task use tokio::task::spawn_blocking instead of tokio::spawn), or document and
tune the runtime's max_blocking_threads to match consume_thread_max; key
symbols: try_acquire_owned, tokio::spawn, ConsumeRequest::run,
listener.consume_message, spawn_blocking, consume_thread_max,
max_blocking_threads.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`:
- Around line 187-199: The delayed-retry helper submit_consume_request_later
currently unconditionally sleeps then calls submit_consume_request, which can
run after shutdown; to cancel those ghost tasks earlier, change
submit_consume_request_later to await either the delay or the shutdown
cancellation and only call submit_consume_request when the delay completed (use
tokio::select! between tokio::time::sleep(Duration::from_secs(5)) and the
shutdown token's cancelled() future), keeping the call to
submit_consume_request(this.clone(), msgs, process_queue, message_queue, true).
Ensure you reference the same shutdown cancellation token used by
submit_consume_request so the delayed task observes shutdown and returns without
calling submit_consume_request if cancelled.
- Around line 357-394: spawn_consume_task currently calls
submit_consume_request_later on semaphore saturation which can spawn many
sleeping retry tasks under sustained backpressure; change the retry strategy so
submit_consume_request_later enqueues the (msgs, this, process_queue,
message_queue) into a single bounded retry queue (e.g., an mpsc::channel) and
have one background retry worker task drain that queue and retry after the 5s
delay (or apply backoff), rather than spawning a new tokio::spawn per saturation
event; update submit_consume_request_later and add a retry worker initialised by
ConsumeMessageConcurrentlyService to avoid unbounded sleeping tasks.
- Around line 369-383: The current code acquires a semaphore permit via
try_acquire_owned and then tokio::spawn's an async task, but ConsumeRequest::run
itself uses tokio::task::spawn_blocking for listener.consume_message, so the
semaphore does not bound the runtime's blocking pool; either move the blocking
execution under the semaphore permit or avoid an inner spawn_blocking: refactor
ConsumeRequest::run so the blocking listener.consume_message is executed while
the permit is held (e.g., perform the blocking call directly or have the outer
task use tokio::task::spawn_blocking instead of tokio::spawn), or document and
tune the runtime's max_blocking_threads to match consume_thread_max; key
symbols: try_acquire_owned, tokio::spawn, ConsumeRequest::run,
listener.consume_message, spawn_blocking, consume_thread_max,
max_blocking_threads.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f53d131 and c8ed7dc.

📒 Files selected for processing (5)

rocketmq-client/benches/concurrent_optimization_benchmark.rs
rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs
rocketmq-client/tests/integration_tests.rs
rocketmq-tools/rocketmq-admin/rocketmq-admin-core/examples/admin_builder_pattern.rs
rocketmq-tools/rocketmq-admin/rocketmq-admin-core/src/lib.rs

codecov · 2026-02-24T13:23:05Z

Codecov Report

❌ Patch coverage is 0% with 135 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.16%. Comparing base (f53d131) to head (c8ed7dc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...sumer_impl/consume_message_concurrently_service.rs	0.00%	135 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6507      +/-   ##
==========================================
- Coverage   42.17%   42.16%   -0.02%     
==========================================
  Files         946      946              
  Lines      132075   132130      +55     
==========================================
  Hits        55708    55708              
- Misses      76367    76422      +55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

rocketmq-rust-bot

LGTM - All CI checks passed ✅

[ISSUE #6506]♻️Refactor consume message concurrency handling to impro…

c8ed7dc

…ve task management and shutdown process

rocketmq-rust-bot added AI review first Ai review pr first auto merge ready to review waiting-review waiting review this PR labels Feb 24, 2026

rocketmq-rust-bot requested review from SpaceXCN, TeslaRustor and rocketmq-rust-bot February 24, 2026 13:15

rocketmq-rust-robot added the refactor♻️ refactor code label Feb 24, 2026

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

rocketmq-rust-bot approved these changes Feb 24, 2026

View reviewed changes

rocketmq-rust-bot merged commit 80b0d8d into main Feb 24, 2026
18 of 20 checks passed

rocketmq-rust-bot added approved PR has approved and removed ready to review waiting-review waiting review this PR labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE #6506]♻️Refactor consume message concurrency handling to improve task management and shutdown process#6507

[ISSUE #6506]♻️Refactor consume message concurrency handling to improve task management and shutdown process#6507
rocketmq-rust-bot merged 1 commit intomainfrom
refactor-6506

mxsm commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

rocketmq-rust-bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

rocketmq-rust-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mxsm commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which Issue(s) This PR Fixes(Closes)

Brief Description

How Did You Test This Change?

Summary by CodeRabbit

Release Notes

Uh oh!

rocketmq-rust-bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rocketmq-rust-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mxsm commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

codecov bot commented Feb 24, 2026 •

edited

Loading