Skip to content

Conversation

@shihaobai
Copy link
Collaborator

No description provided.

@shihaobai shihaobai changed the title improve the schedule improve the scheduler Jul 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @shihaobai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to enhance the overall performance and responsiveness of the LightLLM server by refining its internal scheduling and request handling. It introduces adaptive strategies for managing request queues and processing, which should lead to improved throughput and more efficient resource utilization, particularly in distributed environments.

Highlights

  • Router Scheduling Optimization: I've introduced a dynamic scheduling interval in the router. This allows for a faster 10ms scheduling cycle when the system is idle to quickly pick up new requests, and a scaled interval (default 30ms multiplied by the data parallelism size per node) when active, which helps in better load balancing and request distribution in data parallelism (DP) mode.
  • Visual Server Request Reception Enhancement: The visual server's request reception mechanism has been optimized. It now uses an adaptive batching approach, allowing it to receive multiple requests (up to 256) in a single loop iteration. The batch size dynamically adjusts based on the queue load, increasing when there's a backlog and resetting when the queue clears, improving efficiency.
  • ZeroMQ Context Adjustment: In the visual server, the ZeroMQ context creation has been changed from zmq.asyncio.Context to zmq.Context. This subtle change might affect how ZMQ operations interact with the asyncio event loop or ensure compatibility with the new adaptive request reception logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @shihaobai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the system's scheduling and request handling efficiency. It introduces dynamic adjustments to the core scheduler's frequency based on current workload and implements an adaptive batching strategy for processing incoming visual requests, leading to better resource utilization and a more responsive system overall.

Highlights

  • Dynamic Scheduler Interval: I've implemented a dynamic scheduling interval in the router manager (lightllm/server/router/manager.py). The system now uses a shorter interval (10ms) when no batches are actively running to quickly pick up new requests, and a longer, scaled interval (default 30ms multiplied by dp_size_in_node) when batches are active. This aims to improve request balancing, especially in distributed data parallelism (DP) setups.
  • Adaptive Visual Request Batching: In the visual server (lightllm/server/visualserver/manager.py), I've introduced an adaptive batching mechanism for receiving visual requests. The loop_for_netio_req now attempts to receive multiple requests (starting at 64, up to 256) in a single iteration. This batch size dynamically increases under load and resets on ZMQ errors, optimizing throughput for incoming visual data.
  • ZMQ Context Initialization Refinement: I've refined the ZeroMQ (ZMQ) context initialization in lightllm/server/visualserver/manager.py by changing zmq.asyncio.Context to zmq.Context. This change might address specific concurrency patterns or optimize ZMQ socket handling within the visual server.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve the scheduler by introducing dynamic scheduling intervals and adaptive request batching.

My review has identified several issues:

  • In router/manager.py, the new dynamic scheduling logic is defined but not actually used, making the change ineffective. There are also hardcoded values that should be made configurable for better maintainability.
  • In visualserver/manager.py, there are critical bugs introduced. A change from an asynchronous to a synchronous ZMQ context will likely break the async-based server. Furthermore, the new adaptive request fetching logic is implemented in a way that will cause a TypeError and does not function as intended due to a missing flag in a ZMQ call.

It's important to fix these to ensure the stability and correctness of the application.

visual_model_rpc_ports,
):
context = zmq.asyncio.Context(2)
context = zmq.Context(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The ZMQ context has been changed from zmq.asyncio.Context to zmq.Context. This is a critical issue because the sockets created from this context are used in an async environment.

  • self.recv_from_httpserver is used with await self.recv_from_httpserver.recv_pyobj(). A socket from a synchronous context is not awaitable and will block the event loop or raise an error.
  • self.send_to_next_module.send_pyobj() is called from an async function. Using a blocking socket here will block the entire event loop, severely impacting performance and responsiveness.
    Please revert this change to use the async context.
context = zmq.asyncio.Context(2)

Comment on lines 158 to 167
for _ in range(self.visual_recv_max_count):
recv_req: GroupReqIndexes = await self.recv_from_httpserver.recv_pyobj()
if isinstance(recv_req, GroupReqIndexes):
self.waiting_reqs.append(recv_req)
else:
assert False, f"Error Req Inf {recv_req}"
self.visual_recv_max_count = min(self.visual_recv_max_count * 1.3, 256)
except zmq.ZMQError:
# 当队列已经开始清空的时候,将一次接受数量下调
self.visual_recv_max_count = 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are two critical issues in this block that will cause bugs:

  1. The recv_pyobj() call is missing the zmq.NOBLOCK flag. Without it, the call will block if the ZMQ queue is empty, and the except zmq.ZMQError block will never be reached. This breaks the adaptive batching logic.
  2. On line 164, self.visual_recv_max_count is multiplied by 1.3, which can result in a float. This float value is then used in range(self.visual_recv_max_count) in the next iteration, which will raise a TypeError. You should cast the result to an int.
                for _ in range(int(self.visual_recv_max_count)):
                    recv_req: GroupReqIndexes = await self.recv_from_httpserver.recv_pyobj(zmq.NOBLOCK)
                    if isinstance(recv_req, GroupReqIndexes):
                        self.waiting_reqs.append(recv_req)
                    else:
                        assert False, f"Error Req Inf {recv_req}"
                self.visual_recv_max_count = min(int(self.visual_recv_max_count * 1.3), 256)
            except zmq.ZMQError:
                # 当队列已经开始清空的时候,将一次接受数量下调
                self.visual_recv_max_count = 64

Comment on lines 199 to 205
def _get_schedule_time_interval(self):
if self.running_batch is None:
# 没有运行中的 batch 时,每 10ms 触发一次请求调度
return 0.01

# dp 模式,为了更好的配平,需要更长的调度间隔,以便于能收到更多的请求
return self.schedule_time_interval * self.dp_size_in_node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This new method introduces scheduling logic, but it appears to be unused. The main scheduling loop in loop_for_fwd still contains a hardcoded await asyncio.sleep(0.03). For this new logic to take effect, you should call this function, for example: await asyncio.sleep(self._get_schedule_time_interval()).

Additionally, the value 0.01 is a magic number. It's better to define it as a named constant to improve readability and maintainability, e.g., IDLE_SCHEDULE_INTERVAL = 0.01. This could also be made a configurable parameter.

self.nnodes = args.nnodes
self.node_rank = args.node_rank
self.dp_size = args.dp
self.schedule_time_interval = 0.03 # 默认30ms 的调度周期
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The schedule_time_interval is hardcoded. Consider making this a configurable parameter in StartArgs (defined in lightllm/server/core/objs/start_args_type.py) for better maintainability. This would allow tuning the scheduler interval without changing the code.

Comment on lines +153 to +154
if not hasattr(self, "visual_recv_max_count"):
self.visual_recv_max_count = 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is better practice to initialize instance attributes like visual_recv_max_count within the __init__ method. This makes the code more organized and easier to understand, as all initial state is defined in one place. Lazily initializing it here can make it harder to track the state of the object.

Consider moving self.visual_recv_max_count = 64 to the __init__ method.

@hiworldwzj hiworldwzj merged commit 1c16247 into main Jul 25, 2025
1 check passed
@hiworldwzj hiworldwzj deleted the schedule_interval branch July 25, 2025 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants