Skip to content

[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping#1891

Open
ErvinXie wants to merge 1 commit intomainfrom
feat/kt-numa-nodes
Open

[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping#1891
ErvinXie wants to merge 1 commit intomainfrom
feat/kt-numa-nodes

Conversation

@ErvinXie
Copy link
Collaborator

@ErvinXie ErvinXie commented Mar 18, 2026

Summary

  • Add numa_nodes parameter to BaseMoEWrapper and all subclasses (AMXMoEWrapper, NativeMoEWrapper, GeneralMoEWrapper, LlamafileMoEWrapper, KTMoEWrapper factory)
  • When specified, uses the provided NUMA node IDs for subpool_numa_map instead of hardcoded list(range(threadpool_count))
  • Validates that numa_nodes length matches threadpool_count

Motivation

Currently subpool_numa_map is always [0, 1, ..., threadpool_count-1]. This makes it impossible to run a KTransformers instance on a specific NUMA node (e.g., node 1) without external numactl workarounds.

Related to #1890 — this PR takes a different approach: instead of auto-detecting numactl membind policy via Linux syscalls at runtime, we expose an explicit --kt-numa-nodes CLI parameter. This is more portable (no x86-64 syscall dependency) and consistent with KTransformers' existing configuration style.

Usage

Deploy two independent instances on a dual-NUMA machine, each bound to a different NUMA node:

# Instance 1: bind to NUMA node 0
python -m sglang.launch_server \
  --model /path/to/model \
  --kt-threadpool-count 1 --kt-numa-nodes 0 \
  --kt-cpuinfer 48 \
  --port 30000 \
  ...

# Instance 2: bind to NUMA node 1
python -m sglang.launch_server \
  --model /path/to/model \
  --kt-threadpool-count 1 --kt-numa-nodes 1 \
  --kt-cpuinfer 48 \
  --port 30001 \
  ...

You can also use it with multiple NUMA nodes in a custom order:

# Reverse NUMA node order (node 1 first, node 0 second)
python -m sglang.launch_server \
  --kt-threadpool-count 2 --kt-numa-nodes 1,0 \
  ...

If --kt-numa-nodes is not specified, the behavior is unchanged — defaults to [0, 1, ..., threadpool_count-1].

Companion PR

The SGLang-side changes (adding --kt-numa-nodes CLI arg): kvcache-ai/sglang#28

Tested on

  • AMD EPYC 9355 dual-socket (2 NUMA nodes, 128 threads)
  • Verified CPUInfer creates worker pool on correct NUMA node
  • Verified backward compatibility (omitting --kt-numa-nodes works as before)

Test plan

  • from kt_kernel.experts import KTMoEWrapper imports successfully
  • Backward compatible: existing usage without numa_nodes works unchanged
  • Validation: mismatched numa_nodes length raises ValueError
  • End-to-end: subpool_numa_map=[1] creates worker pool at NUMA node 1

🤖 Generated with Claude Code

Add numa_nodes parameter to BaseMoEWrapper and all subclasses, allowing
users to explicitly specify which NUMA node IDs to use for subpool
mapping instead of always defaulting to sequential [0, 1, ..., N-1].

This enables running multiple KTransformers instances on different NUMA
nodes of the same machine, e.g. --kt-threadpool-count 1 --kt-numa-nodes 1
to bind to NUMA node 1. Previously this required external numactl
workarounds since subpool_numa_map was hardcoded to start from 0.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances KTransformers' resource management capabilities by introducing explicit NUMA node mapping. It provides users with direct control over where computation subpools are allocated on multi-NUMA systems, offering a more portable and configurable alternative to external numactl workarounds. This change allows for more precise resource isolation and optimized performance for multiple KTransformers instances running concurrently.

Highlights

  • Explicit NUMA Node Mapping: Introduced a new numa_nodes parameter across BaseMoEWrapper and its subclasses, allowing users to explicitly define NUMA node IDs for subpool mapping, rather than relying on a sequential default.
  • Input Validation: Added validation to ensure that the length of the provided numa_nodes list matches the threadpool_count, preventing misconfigurations and raising a ValueError if they do not align.
  • Improved Weight Loading Robustness: Enhanced the load_weights methods in AMXMoEWrapper and NativeMoEWrapper to gracefully handle different model key prefixes (e.g., model.layers vs. model.language_model.layers), improving compatibility with various multimodal models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

ErvinXie added a commit to kvcache-ai/sglang that referenced this pull request Mar 18, 2026
Add --kt-numa-nodes parameter to ServerArgs and thread it through
KTConfig to KTMoEWrapper. This allows users to specify which NUMA
node IDs to bind to, enabling multi-instance deployment on different
NUMA nodes without external numactl workarounds.

Usage: --kt-threadpool-count 1 --kt-numa-nodes 1
(binds to NUMA node 1 instead of defaulting to node 0)

Companion to kvcache-ai/ktransformers#1891
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a numa_nodes parameter to the BaseMoEWrapper and its subclasses, enabling explicit NUMA node mapping for KTransformers instances. This enhancement provides a cleaner and more portable alternative to existing workarounds for NUMA node binding, allowing users to run independent instances on specific NUMA nodes without relying on external numactl configurations. The changes include parameter additions to constructors, validation for numa_nodes length, and modifications to the subpool_numa_map initialization logic.

Comment on lines +227 to +232
if numa_nodes is not None:
if len(numa_nodes) != threadpool_count:
raise ValueError(
f"numa_nodes length ({len(numa_nodes)}) must match "
f"threadpool_count ({threadpool_count})"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for numa_nodes could be improved by adding a check to ensure that the provided NUMA node IDs are valid for the system. This could prevent issues if a user provides an ID that doesn't exist.

            if numa_nodes is not None:
                if len(numa_nodes) != threadpool_count:
                    raise ValueError(
                        f"numa_nodes length ({len(numa_nodes)}) must match "
                        f"threadpool_count ({threadpool_count})"
                    )
                if any(node_id >= numa_num_configured_nodes() for node_id in numa_nodes):
                    raise ValueError(
                        f"Invalid NUMA node ID found in numa_nodes. "
                        f"Node IDs must be less than {numa_num_configured_nodes()}."
                    )
                subpool_numa_map = list(numa_nodes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant