Feature Request: openPangu-Ultra-MoE-718B support

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Recently openPangu was released, it is based on DeepSeek V3 architecture, and is a thinking model:
https://ai.gitcode.com/ascend-tribe/openpangu-ultra-moe-718b-model/blob/main/README_EN.md

But it is not entirely identical according to https://www.reddit.com/r/LocalLLaMA/comments/1mhctvk/comment/n6xmva5/

> Pangu architecture is identical to DeepSeek V3 with the sole exception of greater hidden size (and different tokenizer). But unlike Kimi, they rename the architrecture and parameters:
> 
> attention_q_lora_dim = q_lora_rank
> num_experts_per_tok = n_routed_experts
> num_dense_layers = first_k_dense_replace
> attention_qk_dim = qk_nope_head_dim

### Motivation

It would be great if it is possible to run, quantize and generate imatrix with ik_llama.cpp for openPangu, since ik_llama.cpp gives the best performance for large MoE using CPU+GPU for inference.

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: openPangu-Ultra-MoE-718B support #671

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: openPangu-Ultra-MoE-718B support #671

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions