improve the scheduler by shihaobai · Pull Request #989 · ModelTC/LightLLM

shihaobai · 2025-07-25T07:02:32Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @shihaobai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to enhance the overall performance and responsiveness of the LightLLM server by refining its internal scheduling and request handling. It introduces adaptive strategies for managing request queues and processing, which should lead to improved throughput and more efficient resource utilization, particularly in distributed environments.

Highlights

Router Scheduling Optimization: I've introduced a dynamic scheduling interval in the router. This allows for a faster 10ms scheduling cycle when the system is idle to quickly pick up new requests, and a scaled interval (default 30ms multiplied by the data parallelism size per node) when active, which helps in better load balancing and request distribution in data parallelism (DP) mode.
Visual Server Request Reception Enhancement: The visual server's request reception mechanism has been optimized. It now uses an adaptive batching approach, allowing it to receive multiple requests (up to 256) in a single loop iteration. The batch size dynamically adjusts based on the queue load, increasing when there's a backlog and resetting when the queue clears, improving efficiency.
ZeroMQ Context Adjustment: In the visual server, the ZeroMQ context creation has been changed from zmq.asyncio.Context to zmq.Context. This subtle change might affect how ZMQ operations interact with the asyncio event loop or ensure compatibility with the new adaptive request reception logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Summary of Changes

Hello @shihaobai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the system's scheduling and request handling efficiency. It introduces dynamic adjustments to the core scheduler's frequency based on current workload and implements an adaptive batching strategy for processing incoming visual requests, leading to better resource utilization and a more responsive system overall.

Highlights

Dynamic Scheduler Interval: I've implemented a dynamic scheduling interval in the router manager (lightllm/server/router/manager.py). The system now uses a shorter interval (10ms) when no batches are actively running to quickly pick up new requests, and a longer, scaled interval (default 30ms multiplied by dp_size_in_node) when batches are active. This aims to improve request balancing, especially in distributed data parallelism (DP) setups.
Adaptive Visual Request Batching: In the visual server (lightllm/server/visualserver/manager.py), I've introduced an adaptive batching mechanism for receiving visual requests. The loop_for_netio_req now attempts to receive multiple requests (starting at 64, up to 256) in a single iteration. This batch size dynamically increases under load and resets on ZMQ errors, optimizing throughput for incoming visual data.
ZMQ Context Initialization Refinement: I've refined the ZeroMQ (ZMQ) context initialization in lightllm/server/visualserver/manager.py by changing zmq.asyncio.Context to zmq.Context. This change might address specific concurrency patterns or optimize ZMQ socket handling within the visual server.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to improve the scheduler by introducing dynamic scheduling intervals and adaptive request batching.

My review has identified several issues:

In router/manager.py, the new dynamic scheduling logic is defined but not actually used, making the change ineffective. There are also hardcoded values that should be made configurable for better maintainability.
In visualserver/manager.py, there are critical bugs introduced. A change from an asynchronous to a synchronous ZMQ context will likely break the async-based server. Furthermore, the new adaptive request fetching logic is implemented in a way that will cause a TypeError and does not function as intended due to a missing flag in a ZMQ call.

It's important to fix these to ensure the stability and correctness of the application.

gemini-code-assist · 2025-07-25T07:12:09Z

lightllm/server/visualserver/manager.py

        visual_model_rpc_ports,
    ):
-        context = zmq.asyncio.Context(2)
+        context = zmq.Context(2)


The ZMQ context has been changed from zmq.asyncio.Context to zmq.Context. This is a critical issue because the sockets created from this context are used in an async environment.

self.recv_from_httpserver is used with await self.recv_from_httpserver.recv_pyobj(). A socket from a synchronous context is not awaitable and will block the event loop or raise an error.

self.send_to_next_module.send_pyobj() is called from an async function. Using a blocking socket here will block the entire event loop, severely impacting performance and responsiveness.
Please revert this change to use the async context.

context = zmq.asyncio.Context(2)

gemini-code-assist · 2025-07-25T07:12:09Z

lightllm/server/visualserver/manager.py

+                for _ in range(self.visual_recv_max_count):
+                    recv_req: GroupReqIndexes = await self.recv_from_httpserver.recv_pyobj()
+                    if isinstance(recv_req, GroupReqIndexes):
+                        self.waiting_reqs.append(recv_req)
+                    else:
+                        assert False, f"Error Req Inf {recv_req}"
+                self.visual_recv_max_count = min(self.visual_recv_max_count * 1.3, 256)
+            except zmq.ZMQError:
+                # 当队列已经开始清空的时候，将一次接受数量下调
+                self.visual_recv_max_count = 64


There are two critical issues in this block that will cause bugs:

The recv_pyobj() call is missing the zmq.NOBLOCK flag. Without it, the call will block if the ZMQ queue is empty, and the except zmq.ZMQError block will never be reached. This breaks the adaptive batching logic.

On line 164, self.visual_recv_max_count is multiplied by 1.3, which can result in a float. This float value is then used in range(self.visual_recv_max_count) in the next iteration, which will raise a TypeError. You should cast the result to an int.

for _ in range(int(self.visual_recv_max_count)): recv_req: GroupReqIndexes = await self.recv_from_httpserver.recv_pyobj(zmq.NOBLOCK) if isinstance(recv_req, GroupReqIndexes): self.waiting_reqs.append(recv_req) else: assert False, f"Error Req Inf {recv_req}" self.visual_recv_max_count = min(int(self.visual_recv_max_count * 1.3), 256) except zmq.ZMQError: # 当队列已经开始清空的时候，将一次接受数量下调 self.visual_recv_max_count = 64

gemini-code-assist · 2025-07-25T07:12:09Z

lightllm/server/router/manager.py

+    def _get_schedule_time_interval(self):
+        if self.running_batch is None:
+            # 没有运行中的 batch 时，每 10ms 触发一次请求调度
+            return 0.01
+
+        # dp 模式，为了更好的配平，需要更长的调度间隔，以便于能收到更多的请求
+        return self.schedule_time_interval * self.dp_size_in_node


This new method introduces scheduling logic, but it appears to be unused. The main scheduling loop in loop_for_fwd still contains a hardcoded await asyncio.sleep(0.03). For this new logic to take effect, you should call this function, for example: await asyncio.sleep(self._get_schedule_time_interval()).

Additionally, the value 0.01 is a magic number. It's better to define it as a named constant to improve readability and maintainability, e.g., IDLE_SCHEDULE_INTERVAL = 0.01. This could also be made a configurable parameter.

gemini-code-assist · 2025-07-25T07:12:09Z

lightllm/server/router/manager.py

        self.nnodes = args.nnodes
        self.node_rank = args.node_rank
        self.dp_size = args.dp
+        self.schedule_time_interval = 0.03  # 默认30ms 的调度周期


The schedule_time_interval is hardcoded. Consider making this a configurable parameter in StartArgs (defined in lightllm/server/core/objs/start_args_type.py) for better maintainability. This would allow tuning the scheduler interval without changing the code.

gemini-code-assist · 2025-07-25T07:12:09Z

lightllm/server/visualserver/manager.py

+        if not hasattr(self, "visual_recv_max_count"):
+            self.visual_recv_max_count = 64


It is better practice to initialize instance attributes like visual_recv_max_count within the __init__ method. This makes the code more organized and easier to understand, as all initial state is defined in one place. Lazily initializing it here can make it harder to track the state of the object.

Consider moving self.visual_recv_max_count = 64 to the __init__ method.

improve the schedule

360afd6

shihaobai changed the title ~~improve the schedule~~ improve the scheduler Jul 25, 2025

gemini-code-assist bot reviewed Jul 25, 2025

View reviewed changes

fix

952cade

gemini-code-assist bot reviewed Jul 25, 2025

View reviewed changes

fix

adb7636

hiworldwzj merged commit 1c16247 into main Jul 25, 2025
1 check passed

hiworldwzj deleted the schedule_interval branch July 25, 2025 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

improve the scheduler#989

improve the scheduler#989
hiworldwzj merged 3 commits intomainfrom
schedule_interval

shihaobai commented Jul 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 25, 2025

Uh oh!

gemini-code-assist bot Jul 25, 2025

Uh oh!

gemini-code-assist bot Jul 25, 2025

Uh oh!

gemini-code-assist bot Jul 25, 2025

Uh oh!

gemini-code-assist bot Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if not hasattr(self, "visual_recv_max_count"):
		self.visual_recv_max_count = 64

Comments

Conversation

shihaobai commented Jul 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants