Skip to content

Feature Request: openPangu-Ultra-MoE-718B support #671

@Lissanro

Description

@Lissanro

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Recently openPangu was released, it is based on DeepSeek V3 architecture, and is a thinking model:
https://ai.gitcode.com/ascend-tribe/openpangu-ultra-moe-718b-model/blob/main/README_EN.md

But it is not entirely identical according to https://www.reddit.com/r/LocalLLaMA/comments/1mhctvk/comment/n6xmva5/

Pangu architecture is identical to DeepSeek V3 with the sole exception of greater hidden size (and different tokenizer). But unlike Kimi, they rename the architrecture and parameters:

attention_q_lora_dim = q_lora_rank
num_experts_per_tok = n_routed_experts
num_dense_layers = first_k_dense_replace
attention_qk_dim = qk_nope_head_dim

Motivation

It would be great if it is possible to run, quantize and generate imatrix with ik_llama.cpp for openPangu, since ik_llama.cpp gives the best performance for large MoE using CPU+GPU for inference.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions