[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping#1891
[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping#1891
Conversation
Add numa_nodes parameter to BaseMoEWrapper and all subclasses, allowing users to explicitly specify which NUMA node IDs to use for subpool mapping instead of always defaulting to sequential [0, 1, ..., N-1]. This enables running multiple KTransformers instances on different NUMA nodes of the same machine, e.g. --kt-threadpool-count 1 --kt-numa-nodes 1 to bind to NUMA node 1. Previously this required external numactl workarounds since subpool_numa_map was hardcoded to start from 0.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances KTransformers' resource management capabilities by introducing explicit NUMA node mapping. It provides users with direct control over where computation subpools are allocated on multi-NUMA systems, offering a more portable and configurable alternative to external Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Add --kt-numa-nodes parameter to ServerArgs and thread it through KTConfig to KTMoEWrapper. This allows users to specify which NUMA node IDs to bind to, enabling multi-instance deployment on different NUMA nodes without external numactl workarounds. Usage: --kt-threadpool-count 1 --kt-numa-nodes 1 (binds to NUMA node 1 instead of defaulting to node 0) Companion to kvcache-ai/ktransformers#1891
There was a problem hiding this comment.
Code Review
This pull request introduces a numa_nodes parameter to the BaseMoEWrapper and its subclasses, enabling explicit NUMA node mapping for KTransformers instances. This enhancement provides a cleaner and more portable alternative to existing workarounds for NUMA node binding, allowing users to run independent instances on specific NUMA nodes without relying on external numactl configurations. The changes include parameter additions to constructors, validation for numa_nodes length, and modifications to the subpool_numa_map initialization logic.
| if numa_nodes is not None: | ||
| if len(numa_nodes) != threadpool_count: | ||
| raise ValueError( | ||
| f"numa_nodes length ({len(numa_nodes)}) must match " | ||
| f"threadpool_count ({threadpool_count})" | ||
| ) |
There was a problem hiding this comment.
The validation logic for numa_nodes could be improved by adding a check to ensure that the provided NUMA node IDs are valid for the system. This could prevent issues if a user provides an ID that doesn't exist.
if numa_nodes is not None:
if len(numa_nodes) != threadpool_count:
raise ValueError(
f"numa_nodes length ({len(numa_nodes)}) must match "
f"threadpool_count ({threadpool_count})"
)
if any(node_id >= numa_num_configured_nodes() for node_id in numa_nodes):
raise ValueError(
f"Invalid NUMA node ID found in numa_nodes. "
f"Node IDs must be less than {numa_num_configured_nodes()}."
)
subpool_numa_map = list(numa_nodes)
Summary
numa_nodesparameter toBaseMoEWrapperand all subclasses (AMXMoEWrapper,NativeMoEWrapper,GeneralMoEWrapper,LlamafileMoEWrapper,KTMoEWrapperfactory)subpool_numa_mapinstead of hardcodedlist(range(threadpool_count))numa_nodeslength matchesthreadpool_countMotivation
Currently
subpool_numa_mapis always[0, 1, ..., threadpool_count-1]. This makes it impossible to run a KTransformers instance on a specific NUMA node (e.g., node 1) without externalnumactlworkarounds.Related to #1890 — this PR takes a different approach: instead of auto-detecting
numactlmembind policy via Linux syscalls at runtime, we expose an explicit--kt-numa-nodesCLI parameter. This is more portable (no x86-64 syscall dependency) and consistent with KTransformers' existing configuration style.Usage
Deploy two independent instances on a dual-NUMA machine, each bound to a different NUMA node:
You can also use it with multiple NUMA nodes in a custom order:
# Reverse NUMA node order (node 1 first, node 0 second) python -m sglang.launch_server \ --kt-threadpool-count 2 --kt-numa-nodes 1,0 \ ...If
--kt-numa-nodesis not specified, the behavior is unchanged — defaults to[0, 1, ..., threadpool_count-1].Companion PR
The SGLang-side changes (adding
--kt-numa-nodesCLI arg): kvcache-ai/sglang#28Tested on
CPUInfercreates worker pool on correct NUMA node--kt-numa-nodesworks as before)Test plan
from kt_kernel.experts import KTMoEWrapperimports successfullynuma_nodesworks unchangednuma_nodeslength raisesValueErrorsubpool_numa_map=[1]creates worker pool at NUMA node 1🤖 Generated with Claude Code