Skip to content

support numa_binds,eg:numactl --physcpubind= --membind= python -m sgl…#1890

Open
poryfly wants to merge 1 commit intokvcache-ai:mainfrom
poryfly:support_numa_binds_run
Open

support numa_binds,eg:numactl --physcpubind= --membind= python -m sgl…#1890
poryfly wants to merge 1 commit intokvcache-ai:mainfrom
poryfly:support_numa_binds_run

Conversation

@poryfly
Copy link
Contributor

@poryfly poryfly commented Mar 18, 2026

support numa_bind
eg:
numactl --physcpubind=$CPU_CORES_0 --membind=$NUMA_NODE_0 python -m sglang.launch_server...
The entire machine's resources are divided according to NUMA nodes, and multiple sets of services are deployed to fully utilize the resources

What does this PR do?

Fixes # (issue)

Before submitting

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for NUMA (Non-Uniform Memory Access) binding, allowing the system to respect numactl configurations for CPU buffer management. By detecting the NUMA nodes a process is allowed to access, it ensures that resources are utilized more efficiently, especially in environments where multiple services are deployed across different NUMA nodes to maximize hardware potential.

Highlights

  • NUMA Binding Support: Implemented logic to detect and utilize NUMA node binding specified by numactl --membind or --cpunodebind for CPU buffer management, ensuring that thread pools are mapped to the correct NUMA nodes.
  • New NUMA Detection Function: Added a new helper function _get_allowed_numa_nodes that uses libc.syscall to query the system's memory policy and determine the NUMA nodes accessible to the current process.
  • Dynamic Thread Pool NUMA Mapping: Modified the BaseMoEWrapper to dynamically set subpool_numa_map based on the detected NUMA nodes, falling back to sequential IDs if no explicit binding is active or detection fails, and issuing a warning for misconfigurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for NUMA bindings by detecting the allowed NUMA nodes from the process's memory policy. This is a valuable enhancement for optimizing performance on multi-NUMA systems. The implementation uses a Linux-specific syscall via ctypes to achieve this. My review focuses on improving the robustness and portability of this new functionality.

Comment on lines +52 to +53
# SYS_get_mempolicy: 239 on x86-64
SYS_get_mempolicy = 239
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The syscall number for get_mempolicy is hardcoded for the x86-64 architecture. This will cause the function to fail on other architectures, such as aarch64, which uses a different syscall number. To ensure portability, you should detect the machine's architecture and use the appropriate syscall number.

    # SYS_get_mempolicy: 239 on x86-64, 236 on aarch64
    arch = platform.machine()
    if arch == "x86_64":
        SYS_get_mempolicy = 239
    elif arch == "aarch64":
        SYS_get_mempolicy = 236
    else:
        warnings.warn(
            f"NUMA node detection via get_mempolicy is not supported on "
            f"architecture '{arch}'. Falling back to sequential NUMA IDs."
        )
        return None

Comment on lines +100 to +101
except Exception:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The try...except block catches a broad Exception and silently returns None. This can hide underlying issues, making debugging difficult. It would be better to at least log a warning to inform the user that NUMA detection failed and why.

    except Exception as e:
        warnings.warn(f"Failed to get NUMA policy via syscall: {e}. Falling back to sequential NUMA IDs.")
        return None

@ErvinXie
Copy link
Collaborator

Thanks for raising this issue — the use case of running multiple instances on different NUMA nodes is definitely valid.

We've taken a slightly different approach in #1891 + kvcache-ai/sglang#28: instead of auto-detecting the numactl membind policy via Linux syscalls, we expose an explicit --kt-numa-nodes CLI parameter.

Usage example — deploy two instances on a dual-NUMA machine:

# Instance 1: bind to NUMA node 0
python -m sglang.launch_server \
  --kt-threadpool-count 1 --kt-numa-nodes 0 \
  --kt-cpuinfer 48 --port 30000 ...

# Instance 2: bind to NUMA node 1
python -m sglang.launch_server \
  --kt-threadpool-count 1 --kt-numa-nodes 1 \
  --kt-cpuinfer 48 --port 30001 ...

This way you don't need numactl at all — KTransformers handles the NUMA binding internally via hwloc/libnuma, and you just specify which node(s) to use.

Why explicit over auto-detect:

  • No dependency on x86-64 specific syscall numbers (SYS_get_mempolicy = 239)
  • Works on any Linux architecture (ARM, etc.)
  • Consistent with KTransformers' existing configuration style (explicit params > implicit detection)
  • No need to wrap the process with numactl externally

Would love to hear your feedback on whether this approach covers your use case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants