[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping by ErvinXie · Pull Request #1891 · kvcache-ai/ktransformers

ErvinXie · 2026-03-18T13:59:16Z

Summary

Add numa_nodes parameter to BaseMoEWrapper and all subclasses (AMXMoEWrapper, NativeMoEWrapper, GeneralMoEWrapper, LlamafileMoEWrapper, KTMoEWrapper factory)
When specified, uses the provided NUMA node IDs for subpool_numa_map instead of hardcoded list(range(threadpool_count))
Validates that numa_nodes length matches threadpool_count

Motivation

Currently subpool_numa_map is always [0, 1, ..., threadpool_count-1]. This makes it impossible to run a KTransformers instance on a specific NUMA node (e.g., node 1) without external numactl workarounds.

Related to #1890 — this PR takes a different approach: instead of auto-detecting numactl membind policy via Linux syscalls at runtime, we expose an explicit --kt-numa-nodes CLI parameter. This is more portable (no x86-64 syscall dependency) and consistent with KTransformers' existing configuration style.

Usage

Deploy two independent instances on a dual-NUMA machine, each bound to a different NUMA node:

# Instance 1: bind to NUMA node 0
python -m sglang.launch_server \
  --model /path/to/model \
  --kt-threadpool-count 1 --kt-numa-nodes 0 \
  --kt-cpuinfer 48 \
  --port 30000 \
  ...

# Instance 2: bind to NUMA node 1
python -m sglang.launch_server \
  --model /path/to/model \
  --kt-threadpool-count 1 --kt-numa-nodes 1 \
  --kt-cpuinfer 48 \
  --port 30001 \
  ...

You can also use it with multiple NUMA nodes in a custom order:

# Reverse NUMA node order (node 1 first, node 0 second)
python -m sglang.launch_server \
  --kt-threadpool-count 2 --kt-numa-nodes 1,0 \
  ...

If --kt-numa-nodes is not specified, the behavior is unchanged — defaults to [0, 1, ..., threadpool_count-1].

Companion PR

The SGLang-side changes (adding --kt-numa-nodes CLI arg): kvcache-ai/sglang#28

Tested on

AMD EPYC 9355 dual-socket (2 NUMA nodes, 128 threads)
Verified CPUInfer creates worker pool on correct NUMA node
Verified backward compatibility (omitting --kt-numa-nodes works as before)

Test plan

from kt_kernel.experts import KTMoEWrapper imports successfully
Backward compatible: existing usage without numa_nodes works unchanged
Validation: mismatched numa_nodes length raises ValueError
End-to-end: subpool_numa_map=[1] creates worker pool at NUMA node 1

🤖 Generated with Claude Code

Add numa_nodes parameter to BaseMoEWrapper and all subclasses, allowing users to explicitly specify which NUMA node IDs to use for subpool mapping instead of always defaulting to sequential [0, 1, ..., N-1]. This enables running multiple KTransformers instances on different NUMA nodes of the same machine, e.g. --kt-threadpool-count 1 --kt-numa-nodes 1 to bind to NUMA node 1. Previously this required external numactl workarounds since subpool_numa_map was hardcoded to start from 0.

gemini-code-assist · 2026-03-18T13:59:43Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances KTransformers' resource management capabilities by introducing explicit NUMA node mapping. It provides users with direct control over where computation subpools are allocated on multi-NUMA systems, offering a more portable and configurable alternative to external numactl workarounds. This change allows for more precise resource isolation and optimized performance for multiple KTransformers instances running concurrently.

Highlights

Explicit NUMA Node Mapping: Introduced a new numa_nodes parameter across BaseMoEWrapper and its subclasses, allowing users to explicitly define NUMA node IDs for subpool mapping, rather than relying on a sequential default.
Input Validation: Added validation to ensure that the length of the provided numa_nodes list matches the threadpool_count, preventing misconfigurations and raising a ValueError if they do not align.
Improved Weight Loading Robustness: Enhanced the load_weights methods in AMXMoEWrapper and NativeMoEWrapper to gracefully handle different model key prefixes (e.g., model.layers vs. model.language_model.layers), improving compatibility with various multimodal models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Add --kt-numa-nodes parameter to ServerArgs and thread it through KTConfig to KTMoEWrapper. This allows users to specify which NUMA node IDs to bind to, enabling multi-instance deployment on different NUMA nodes without external numactl workarounds. Usage: --kt-threadpool-count 1 --kt-numa-nodes 1 (binds to NUMA node 1 instead of defaulting to node 0) Companion to kvcache-ai/ktransformers#1891

gemini-code-assist

Code Review

This pull request introduces a numa_nodes parameter to the BaseMoEWrapper and its subclasses, enabling explicit NUMA node mapping for KTransformers instances. This enhancement provides a cleaner and more portable alternative to existing workarounds for NUMA node binding, allowing users to run independent instances on specific NUMA nodes without relying on external numactl configurations. The changes include parameter additions to constructors, validation for numa_nodes length, and modifications to the subpool_numa_map initialization logic.

gemini-code-assist · 2026-03-18T14:02:02Z

kt-kernel/python/experts_base.py

+            if numa_nodes is not None:
+                if len(numa_nodes) != threadpool_count:
+                    raise ValueError(
+                        f"numa_nodes length ({len(numa_nodes)}) must match "
+                        f"threadpool_count ({threadpool_count})"
+                    )


The validation logic for numa_nodes could be improved by adding a check to ensure that the provided NUMA node IDs are valid for the system. This could prevent issues if a user provides an ID that doesn't exist.

if numa_nodes is not None: if len(numa_nodes) != threadpool_count: raise ValueError( f"numa_nodes length ({len(numa_nodes)}) must match " f"threadpool_count ({threadpool_count})" ) if any(node_id >= numa_num_configured_nodes() for node_id in numa_nodes): raise ValueError( f"Invalid NUMA node ID found in numa_nodes. " f"Node IDs must be less than {numa_num_configured_nodes()}." ) subpool_numa_map = list(numa_nodes)

ErvinXie mentioned this pull request Mar 18, 2026

[feat]: add --kt-numa-nodes for explicit NUMA node mapping kvcache-ai/sglang#28

Open

4 tasks

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

ErvinXie mentioned this pull request Mar 18, 2026

support numa_binds,eg:numactl --physcpubind= --membind= python -m sgl… #1890

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping#1891

[feat](kt-kernel): add --kt-numa-nodes for explicit NUMA node mapping#1891
ErvinXie wants to merge 1 commit intomainfrom
feat/kt-numa-nodes

ErvinXie commented Mar 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ErvinXie commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Usage

Companion PR

Tested on

Test plan

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ErvinXie commented Mar 18, 2026 •

edited

Loading