Skip to content

[ISSUE #6506]♻️Refactor consume message concurrency handling to improve task management and shutdown process#6507

Merged
rocketmq-rust-bot merged 1 commit intomainfrom
refactor-6506
Feb 24, 2026
Merged

[ISSUE #6506]♻️Refactor consume message concurrency handling to improve task management and shutdown process#6507
rocketmq-rust-bot merged 1 commit intomainfrom
refactor-6506

Conversation

@mxsm
Copy link
Owner

@mxsm mxsm commented Feb 24, 2026

Which Issue(s) This PR Fixes(Closes)

Brief Description

How Did You Test This Change?

Summary by CodeRabbit

Release Notes

  • Refactor

    • Improved message consumption concurrency control with semaphore-based limits and graceful shutdown coordination.
    • Enhanced error logging for broadcast failures and message consumption issues.
  • Chores

    • Increased compiler recursion limit in multiple modules to support complex macro expansions.

@rocketmq-rust-bot
Copy link
Collaborator

🔊@mxsm 🚀Thanks for your contribution🎉!

💡CodeRabbit(AI) will review your code first🔥!

Note

🚨The code review suggestions from CodeRabbit are to be used as a reference only, and the PR submitter can decide whether to make changes based on their own judgment. Ultimately, the project management personnel will conduct the final code review💥.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Walkthrough

The changes refactor concurrent message consumption handling by replacing a runtime-backed spawning mechanism with semaphore-based concurrency control for improved task management and graceful shutdown. Additionally, crate-level recursion limits are increased across multiple files to support deeper macro expansion during compilation.

Changes

Cohort / File(s) Summary
Recursion Limit Attributes
rocketmq-client/benches/concurrent_optimization_benchmark.rs, rocketmq-client/tests/integration_tests.rs, rocketmq-tools/rocketmq-admin/rocketmq-admin-core/examples/admin_builder_pattern.rs, rocketmq-tools/rocketmq-admin/rocketmq-admin-core/src/lib.rs
Added crate-level #![recursion_limit = "256"] attribute to increase allowed macro recursion depth during compilation.
Concurrent Message Consumption Refactoring
rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs
Replaced RocketMQRuntime-based spawning with semaphore-backed concurrency control; introduced consume_semaphore and shutdown_token for backpressure and graceful shutdown; refactored spawn points to use tokio::spawn with permit acquisition; enhanced error handling and hook execution gating; updated clustering backoff logic and message processing with blocking/async split.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

refactor♻️

Suggested reviewers

  • SpaceXCN
  • TeslaRustor
  • rocketmq-rust-bot

Poem

🐰✨ With semaphores spun and tokens held tight,
Concurrency bounded, shutdown runs right,
The rabbit hops through recursive compile,
Old runtimes fade—new grace all the while! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Recursion limit increases in four files appear tangentially related to the refactoring but are not core to the stated objective of improving concurrency handling and shutdown. Clarify whether recursion limit adjustments are necessary dependencies for the refactoring or should be addressed in a separate PR focusing on compilation fixes.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring work: replacing RocketMQRuntime with semaphore-based concurrency control and improving shutdown semantics.
Linked Issues check ✅ Passed The linked issue #6506 provides only a high-level refactoring objective without specific requirements. The PR implements concurrency handling improvements (semaphore-based control, graceful shutdown, backpressure) that align with the stated goal.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-6506

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs (3)

187-199: submit_consume_request_later doesn't check the cancellation token, but the callee does — acceptable.

The delayed retry task will be harmlessly rejected by submit_consume_request at line 319 when the shutdown token is already cancelled. No issue here, but be aware these ghost tasks can fire after shutdown() returns (since shutdown doesn't await them). If strict post-shutdown silence is needed, you could select on the token inside the delay:

Optional: cancel delayed retries on shutdown
 fn submit_consume_request_later(
     &self,
     msgs: Vec<ArcMut<MessageExt>>,
     this: ArcMut<Self>,
     process_queue: Arc<ProcessQueue>,
     message_queue: MessageQueue,
 ) {
+    let token = self.shutdown_token.clone();
     tokio::spawn(async move {
-        tokio::time::sleep(Duration::from_secs(5)).await;
-        this.submit_consume_request(this.clone(), msgs, process_queue, message_queue, true)
-            .await;
+        tokio::select! {
+            _ = tokio::time::sleep(Duration::from_secs(5)) => {
+                this.submit_consume_request(this.clone(), msgs, process_queue, message_queue, true)
+                    .await;
+            }
+            _ = token.cancelled() => {}
+        }
     });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`
around lines 187 - 199, The delayed-retry helper submit_consume_request_later
currently unconditionally sleeps then calls submit_consume_request, which can
run after shutdown; to cancel those ghost tasks earlier, change
submit_consume_request_later to await either the delay or the shutdown
cancellation and only call submit_consume_request when the delay completed (use
tokio::select! between tokio::time::sleep(Duration::from_secs(5)) and the
shutdown token's cancelled() future), keeping the call to
submit_consume_request(this.clone(), msgs, process_queue, message_queue, true).
Ensure you reference the same shutdown cancellation token used by
submit_consume_request so the delayed task observes shutdown and returns without
calling submit_consume_request if cancelled.

357-394: Permit lifecycle is well-managed; consider the retry accumulation scenario.

The _permit binding correctly ties the semaphore permit to the task's lifetime, ensuring release on both normal completion and panics.

One edge-case to consider: under sustained backpressure, each saturation event spawns a new delayed retry via submit_consume_request_later, and if those retries also hit saturation, the cycle compounds — potentially accumulating many sleeping tokio::spawn tasks. This matches the Java SDK's behavior, but you might want to monitor the retry queue depth in production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`
around lines 357 - 394, spawn_consume_task currently calls
submit_consume_request_later on semaphore saturation which can spawn many
sleeping retry tasks under sustained backpressure; change the retry strategy so
submit_consume_request_later enqueues the (msgs, this, process_queue,
message_queue) into a single bounded retry queue (e.g., an mpsc::channel) and
have one background retry worker task drain that queue and retry after the 5s
delay (or apply backoff), rather than spawning a new tokio::spawn per saturation
event; update submit_consume_request_later and add a retry worker initialised by
ConsumeMessageConcurrentlyService to avoid unbounded sleeping tasks.

369-383: Note: spawn_blocking in ConsumeRequest::run uses Tokio's blocking pool, not the semaphore-bounded pool.

The semaphore correctly limits the number of concurrent logical consume tasks, but each task's actual blocking work (listener.consume_message at line 464) runs on Tokio's default blocking thread pool (up to 512 threads). If consume_thread_max is much smaller than 512, the semaphore is the effective bottleneck — which is the intent. Just be aware that under extreme load, the blocking pool configuration (max_blocking_threads) may also need tuning independently of consume_thread_max.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`
around lines 369 - 383, The current code acquires a semaphore permit via
try_acquire_owned and then tokio::spawn's an async task, but ConsumeRequest::run
itself uses tokio::task::spawn_blocking for listener.consume_message, so the
semaphore does not bound the runtime's blocking pool; either move the blocking
execution under the semaphore permit or avoid an inner spawn_blocking: refactor
ConsumeRequest::run so the blocking listener.consume_message is executed while
the permit is held (e.g., perform the blocking call directly or have the outer
task use tokio::task::spawn_blocking instead of tokio::spawn), or document and
tune the runtime's max_blocking_threads to match consume_thread_max; key
symbols: try_acquire_owned, tokio::spawn, ConsumeRequest::run,
listener.consume_message, spawn_blocking, consume_thread_max,
max_blocking_threads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs`:
- Around line 187-199: The delayed-retry helper submit_consume_request_later
currently unconditionally sleeps then calls submit_consume_request, which can
run after shutdown; to cancel those ghost tasks earlier, change
submit_consume_request_later to await either the delay or the shutdown
cancellation and only call submit_consume_request when the delay completed (use
tokio::select! between tokio::time::sleep(Duration::from_secs(5)) and the
shutdown token's cancelled() future), keeping the call to
submit_consume_request(this.clone(), msgs, process_queue, message_queue, true).
Ensure you reference the same shutdown cancellation token used by
submit_consume_request so the delayed task observes shutdown and returns without
calling submit_consume_request if cancelled.
- Around line 357-394: spawn_consume_task currently calls
submit_consume_request_later on semaphore saturation which can spawn many
sleeping retry tasks under sustained backpressure; change the retry strategy so
submit_consume_request_later enqueues the (msgs, this, process_queue,
message_queue) into a single bounded retry queue (e.g., an mpsc::channel) and
have one background retry worker task drain that queue and retry after the 5s
delay (or apply backoff), rather than spawning a new tokio::spawn per saturation
event; update submit_consume_request_later and add a retry worker initialised by
ConsumeMessageConcurrentlyService to avoid unbounded sleeping tasks.
- Around line 369-383: The current code acquires a semaphore permit via
try_acquire_owned and then tokio::spawn's an async task, but ConsumeRequest::run
itself uses tokio::task::spawn_blocking for listener.consume_message, so the
semaphore does not bound the runtime's blocking pool; either move the blocking
execution under the semaphore permit or avoid an inner spawn_blocking: refactor
ConsumeRequest::run so the blocking listener.consume_message is executed while
the permit is held (e.g., perform the blocking call directly or have the outer
task use tokio::task::spawn_blocking instead of tokio::spawn), or document and
tune the runtime's max_blocking_threads to match consume_thread_max; key
symbols: try_acquire_owned, tokio::spawn, ConsumeRequest::run,
listener.consume_message, spawn_blocking, consume_thread_max,
max_blocking_threads.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f53d131 and c8ed7dc.

📒 Files selected for processing (5)
  • rocketmq-client/benches/concurrent_optimization_benchmark.rs
  • rocketmq-client/src/consumer/consumer_impl/consume_message_concurrently_service.rs
  • rocketmq-client/tests/integration_tests.rs
  • rocketmq-tools/rocketmq-admin/rocketmq-admin-core/examples/admin_builder_pattern.rs
  • rocketmq-tools/rocketmq-admin/rocketmq-admin-core/src/lib.rs

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 0% with 135 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.16%. Comparing base (f53d131) to head (c8ed7dc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...sumer_impl/consume_message_concurrently_service.rs 0.00% 135 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6507      +/-   ##
==========================================
- Coverage   42.17%   42.16%   -0.02%     
==========================================
  Files         946      946              
  Lines      132075   132130      +55     
==========================================
  Hits        55708    55708              
- Misses      76367    76422      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@rocketmq-rust-bot rocketmq-rust-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - All CI checks passed ✅

@rocketmq-rust-bot rocketmq-rust-bot merged commit 80b0d8d into main Feb 24, 2026
18 of 20 checks passed
@rocketmq-rust-bot rocketmq-rust-bot added approved PR has approved and removed ready to review waiting-review waiting review this PR labels Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI review first Ai review pr first approved PR has approved auto merge refactor♻️ refactor code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Refactor♻️] Refactor consume message concurrency handling to improve task management and shutdown process

3 participants