Skip to content

Commit 71f683a

Browse files
ErvinXieouqingliangclaude
authored
Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill * Update Kimi-K2-Thinking.md * Create Kimi-K2-Thinking-Native.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking-Native.md * [perf] optimize K2 MoE weight loading with per-expert pointers - Avoid expensive torch.stack().contiguous() in Python (was ~6.6s) - Use per-expert pointer arrays (gate_projs) instead of contiguous memory - C++ worker pool performs parallel memcpy for TP slicing - Add LOAD_TIME_PROFILE for load_weights timing analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: ouqingliang <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent 4850424 commit 71f683a

File tree

5 files changed

+423
-74
lines changed

5 files changed

+423
-74
lines changed

doc/en/Kimi-K2-Thinking-Native.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
需要先写如何安装运行,然后写一个性能,然后链接到如何使用 claude code 接入的文档。

doc/en/Kimi-K2-Thinking.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# KTransformers+SGLang Inference Deployment
2+
Please Note This is Quantization Deployment. For Native Kimi K2 Thinking deployment please refer to [here](./Kimi-K2-Thinking-Native.md).
23

34
## Installation
45

0 commit comments

Comments
 (0)