-
Notifications
You must be signed in to change notification settings - Fork 293
improve the scheduler #989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @shihaobai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request aims to enhance the overall performance and responsiveness of the LightLLM server by refining its internal scheduling and request handling. It introduces adaptive strategies for managing request queues and processing, which should lead to improved throughput and more efficient resource utilization, particularly in distributed environments.
Highlights
- Router Scheduling Optimization: I've introduced a dynamic scheduling interval in the router. This allows for a faster 10ms scheduling cycle when the system is idle to quickly pick up new requests, and a scaled interval (default 30ms multiplied by the data parallelism size per node) when active, which helps in better load balancing and request distribution in data parallelism (DP) mode.
- Visual Server Request Reception Enhancement: The visual server's request reception mechanism has been optimized. It now uses an adaptive batching approach, allowing it to receive multiple requests (up to 256) in a single loop iteration. The batch size dynamically adjusts based on the queue load, increasing when there's a backlog and resetting when the queue clears, improving efficiency.
- ZeroMQ Context Adjustment: In the visual server, the ZeroMQ context creation has been changed from
zmq.asyncio.Contexttozmq.Context. This subtle change might affect how ZMQ operations interact with the asyncio event loop or ensure compatibility with the new adaptive request reception logic.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @shihaobai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly improves the system's scheduling and request handling efficiency. It introduces dynamic adjustments to the core scheduler's frequency based on current workload and implements an adaptive batching strategy for processing incoming visual requests, leading to better resource utilization and a more responsive system overall.
Highlights
- Dynamic Scheduler Interval: I've implemented a dynamic scheduling interval in the router manager (
lightllm/server/router/manager.py). The system now uses a shorter interval (10ms) when no batches are actively running to quickly pick up new requests, and a longer, scaled interval (default 30ms multiplied bydp_size_in_node) when batches are active. This aims to improve request balancing, especially in distributed data parallelism (DP) setups. - Adaptive Visual Request Batching: In the visual server (
lightllm/server/visualserver/manager.py), I've introduced an adaptive batching mechanism for receiving visual requests. Theloop_for_netio_reqnow attempts to receive multiple requests (starting at 64, up to 256) in a single iteration. This batch size dynamically increases under load and resets on ZMQ errors, optimizing throughput for incoming visual data. - ZMQ Context Initialization Refinement: I've refined the ZeroMQ (ZMQ) context initialization in
lightllm/server/visualserver/manager.pyby changingzmq.asyncio.Contexttozmq.Context. This change might address specific concurrency patterns or optimize ZMQ socket handling within the visual server.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to improve the scheduler by introducing dynamic scheduling intervals and adaptive request batching.
My review has identified several issues:
- In
router/manager.py, the new dynamic scheduling logic is defined but not actually used, making the change ineffective. There are also hardcoded values that should be made configurable for better maintainability. - In
visualserver/manager.py, there are critical bugs introduced. A change from an asynchronous to a synchronous ZMQ context will likely break the async-based server. Furthermore, the new adaptive request fetching logic is implemented in a way that will cause aTypeErrorand does not function as intended due to a missing flag in a ZMQ call.
It's important to fix these to ensure the stability and correctness of the application.
| visual_model_rpc_ports, | ||
| ): | ||
| context = zmq.asyncio.Context(2) | ||
| context = zmq.Context(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZMQ context has been changed from zmq.asyncio.Context to zmq.Context. This is a critical issue because the sockets created from this context are used in an async environment.
self.recv_from_httpserveris used withawait self.recv_from_httpserver.recv_pyobj(). A socket from a synchronous context is not awaitable and will block the event loop or raise an error.self.send_to_next_module.send_pyobj()is called from anasyncfunction. Using a blocking socket here will block the entire event loop, severely impacting performance and responsiveness.
Please revert this change to use the async context.
context = zmq.asyncio.Context(2)| for _ in range(self.visual_recv_max_count): | ||
| recv_req: GroupReqIndexes = await self.recv_from_httpserver.recv_pyobj() | ||
| if isinstance(recv_req, GroupReqIndexes): | ||
| self.waiting_reqs.append(recv_req) | ||
| else: | ||
| assert False, f"Error Req Inf {recv_req}" | ||
| self.visual_recv_max_count = min(self.visual_recv_max_count * 1.3, 256) | ||
| except zmq.ZMQError: | ||
| # 当队列已经开始清空的时候,将一次接受数量下调 | ||
| self.visual_recv_max_count = 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two critical issues in this block that will cause bugs:
- The
recv_pyobj()call is missing thezmq.NOBLOCKflag. Without it, the call will block if the ZMQ queue is empty, and theexcept zmq.ZMQErrorblock will never be reached. This breaks the adaptive batching logic. - On line 164,
self.visual_recv_max_countis multiplied by1.3, which can result in a float. This float value is then used inrange(self.visual_recv_max_count)in the next iteration, which will raise aTypeError. You should cast the result to anint.
for _ in range(int(self.visual_recv_max_count)):
recv_req: GroupReqIndexes = await self.recv_from_httpserver.recv_pyobj(zmq.NOBLOCK)
if isinstance(recv_req, GroupReqIndexes):
self.waiting_reqs.append(recv_req)
else:
assert False, f"Error Req Inf {recv_req}"
self.visual_recv_max_count = min(int(self.visual_recv_max_count * 1.3), 256)
except zmq.ZMQError:
# 当队列已经开始清空的时候,将一次接受数量下调
self.visual_recv_max_count = 64
lightllm/server/router/manager.py
Outdated
| def _get_schedule_time_interval(self): | ||
| if self.running_batch is None: | ||
| # 没有运行中的 batch 时,每 10ms 触发一次请求调度 | ||
| return 0.01 | ||
|
|
||
| # dp 模式,为了更好的配平,需要更长的调度间隔,以便于能收到更多的请求 | ||
| return self.schedule_time_interval * self.dp_size_in_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new method introduces scheduling logic, but it appears to be unused. The main scheduling loop in loop_for_fwd still contains a hardcoded await asyncio.sleep(0.03). For this new logic to take effect, you should call this function, for example: await asyncio.sleep(self._get_schedule_time_interval()).
Additionally, the value 0.01 is a magic number. It's better to define it as a named constant to improve readability and maintainability, e.g., IDLE_SCHEDULE_INTERVAL = 0.01. This could also be made a configurable parameter.
lightllm/server/router/manager.py
Outdated
| self.nnodes = args.nnodes | ||
| self.node_rank = args.node_rank | ||
| self.dp_size = args.dp | ||
| self.schedule_time_interval = 0.03 # 默认30ms 的调度周期 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not hasattr(self, "visual_recv_max_count"): | ||
| self.visual_recv_max_count = 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better practice to initialize instance attributes like visual_recv_max_count within the __init__ method. This makes the code more organized and easier to understand, as all initial state is defined in one place. Lazily initializing it here can make it harder to track the state of the object.
Consider moving self.visual_recv_max_count = 64 to the __init__ method.
No description provided.