-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Recently openPangu was released, it is based on DeepSeek V3 architecture, and is a thinking model:
https://ai.gitcode.com/ascend-tribe/openpangu-ultra-moe-718b-model/blob/main/README_EN.md
But it is not entirely identical according to https://www.reddit.com/r/LocalLLaMA/comments/1mhctvk/comment/n6xmva5/
Pangu architecture is identical to DeepSeek V3 with the sole exception of greater hidden size (and different tokenizer). But unlike Kimi, they rename the architrecture and parameters:
attention_q_lora_dim = q_lora_rank
num_experts_per_tok = n_routed_experts
num_dense_layers = first_k_dense_replace
attention_qk_dim = qk_nope_head_dim
Motivation
It would be great if it is possible to run, quantize and generate imatrix with ik_llama.cpp for openPangu, since ik_llama.cpp gives the best performance for large MoE using CPU+GPU for inference.
Possible Implementation
No response