support numa_binds,eg:numactl --physcpubind= --membind= python -m sgl…#1890
support numa_binds,eg:numactl --physcpubind= --membind= python -m sgl…#1890poryfly wants to merge 1 commit intokvcache-ai:mainfrom
Conversation
…ang.launch_server
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for NUMA (Non-Uniform Memory Access) binding, allowing the system to respect Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for NUMA bindings by detecting the allowed NUMA nodes from the process's memory policy. This is a valuable enhancement for optimizing performance on multi-NUMA systems. The implementation uses a Linux-specific syscall via ctypes to achieve this. My review focuses on improving the robustness and portability of this new functionality.
| # SYS_get_mempolicy: 239 on x86-64 | ||
| SYS_get_mempolicy = 239 |
There was a problem hiding this comment.
The syscall number for get_mempolicy is hardcoded for the x86-64 architecture. This will cause the function to fail on other architectures, such as aarch64, which uses a different syscall number. To ensure portability, you should detect the machine's architecture and use the appropriate syscall number.
# SYS_get_mempolicy: 239 on x86-64, 236 on aarch64
arch = platform.machine()
if arch == "x86_64":
SYS_get_mempolicy = 239
elif arch == "aarch64":
SYS_get_mempolicy = 236
else:
warnings.warn(
f"NUMA node detection via get_mempolicy is not supported on "
f"architecture '{arch}'. Falling back to sequential NUMA IDs."
)
return None| except Exception: | ||
| return None |
There was a problem hiding this comment.
The try...except block catches a broad Exception and silently returns None. This can hide underlying issues, making debugging difficult. It would be better to at least log a warning to inform the user that NUMA detection failed and why.
except Exception as e:
warnings.warn(f"Failed to get NUMA policy via syscall: {e}. Falling back to sequential NUMA IDs.")
return None|
Thanks for raising this issue — the use case of running multiple instances on different NUMA nodes is definitely valid. We've taken a slightly different approach in #1891 + kvcache-ai/sglang#28: instead of auto-detecting the Usage example — deploy two instances on a dual-NUMA machine: # Instance 1: bind to NUMA node 0
python -m sglang.launch_server \
--kt-threadpool-count 1 --kt-numa-nodes 0 \
--kt-cpuinfer 48 --port 30000 ...
# Instance 2: bind to NUMA node 1
python -m sglang.launch_server \
--kt-threadpool-count 1 --kt-numa-nodes 1 \
--kt-cpuinfer 48 --port 30001 ...This way you don't need Why explicit over auto-detect:
Would love to hear your feedback on whether this approach covers your use case! |
support numa_bind
eg:
numactl --physcpubind=$CPU_CORES_0 --membind=$NUMA_NODE_0 python -m sglang.launch_server...
The entire machine's resources are divided according to NUMA nodes, and multiple sets of services are deployed to fully utilize the resources
What does this PR do?
Fixes # (issue)
Before submitting