Commit 71f683a
Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill
* Update Kimi-K2-Thinking.md
* Create Kimi-K2-Thinking-Native.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking-Native.md
* [perf] optimize K2 MoE weight loading with per-expert pointers
- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
---------
Co-authored-by: ouqingliang <[email protected]>
Co-authored-by: Claude <[email protected]>1 parent 4850424 commit 71f683a
File tree
5 files changed
+423
-74
lines changed- doc/en
- kt-kernel
- operators/amx
- la
- python/utils
5 files changed
+423
-74
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
0 commit comments