feat: support GLM-5-FP8 in HYBRID mode by jasonqinzhou · Pull Request #406 · ai-dynamo/aiconfigurator

jasonqinzhou · 2026-02-20T03:22:41Z

All GLM-5-specific MLA params are correctly read. Here's a summary of what was done:

New files:

model_configs/zai-org--GLM-5-FP8_config.json — 78 layers, 64 heads, 256 experts, MLA with q_lora_rank=2048,
qk_nope=192, v_head=256

common.py:

Added DeepSeekMLAConfig dataclass (5 MLA params)
Registered GlmMoeDsaForCausalLM → "DEEPSEEK" family
Added zai-org/GLM-5-FP8 to DefaultHFModels

utils.py:

For any DEEPSEEK-family architecture with kv_lora_rank in config, populate extra_params as DeepSeekMLAConfig

models.py:

All 3 DEEPSEEK model classes (DeepSeekModel, TrtllmWideEPDeepSeekModel, WideEPDeepSeekModel) now accept mla_config as
a parameter
Replaced all 7 hardcoded MLA constants in DeepSeekModel and TrtllmWideEPDeepSeekModel with expressions derived from
mla_config (defaulting to DeepSeek-V3 values for backward compat)

  ┌─────────────────────────┬───────────────────────────────┬────────────────────────────────────┐                       
  │          Model          │         Architecture          │             Supported?             │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤                       
  │ GLM-5-FP8               │ GlmMoeDsaForCausalLM          │ ✅ Yes                             │                       
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.7-Flash           │ Glm4MoeLiteForCausalLM        │ ✅ Yes (just added)                │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.7-FP8             │ Glm4MoeForCausalLM            │ ✅ Yes (just added)                │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.6V-Flash-MLX-8bit │ Glm4vForConditionalGeneration │ ❌ No (vision + MLX/Apple Silicon) │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.6V-Flash-MLX-6bit │ Glm4vForConditionalGeneration │ ❌ No (vision + MLX/Apple Silicon) │
  └─────────────────────────┴───────────────────────────────┴────────────────────────────────────┘

tianhaox · 2026-02-21T13:30:01Z

glm 5 needs DSA, same as DS V3.2

tianhaox · 2026-02-28T11:54:49Z

src/aiconfigurator/cli/main.py

        help="Optional end-to-end request latency target (ms). Enables request-latency optimization mode.",
    )
    parser.add_argument("--prefix", type=int, default=0, help="Prefix cache length. Default to 0.")
+    parser.add_argument(


suggest we remove the wideep support in this pr. Do more complete design in a seperate PR.

tianhaox · 2026-02-28T11:57:12Z

src/aiconfigurator/model_configs/zai-org--GLM-5-FP8_config.json

@@ -0,0 +1,27 @@
+{


This is incorrect.
https://huggingface.co/zai-org/GLM-5-FP8/blob/main/config.json

i will suggest we do manual copy paste to avoid illusion.

tianhaox · 2026-02-28T11:57:31Z

src/aiconfigurator/model_configs/zai-org--GLM-4.7-FP8_config.json

@@ -0,0 +1,21 @@
+{


this is incorrect. without quant field.

tianhaox · 2026-02-28T12:05:33Z

I think this whole PR needs a better design of DEEPSEEK_V32 model family. DS V32 and GLM-5 will share the model family. It uses DSA+MoE. I suggest we cancel this PR. Redesign after the DSA Op PR I've created. I will start a PR manually to better support this DEEPSEEK_V32 model family

feat: support GLM-5-FP8 in HYBRID mode

2f6fff5

github-actions bot added the feat label Feb 20, 2026

jasonqinzhou added 2 commits February 19, 2026 19:34

fix

a78c4c1

more GLM

19d098a

jasonqinzhou marked this pull request as ready for review February 20, 2026 04:01

jasonqinzhou requested review from AichenF, Arsene12358, Harrilee, YijiaZhao, ilyasher, simone-chen, tianhaox and xutizhou as code owners February 20, 2026 04:01

jasonqinzhou mentioned this pull request Feb 20, 2026

feat: agent skill for adding a new hybrid architecture model #413

Closed

enable-wideep

c5a3673

jasonqinzhou requested a review from Ethan-ES as a code owner February 21, 2026 01:55

tianhaox reviewed Feb 28, 2026

View reviewed changes

src/aiconfigurator/model_configs/zai-org--GLM-4.7-FP8_config.json

@@ -0,0 +1,21 @@

{

Copy link

Contributor

tianhaox Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect. without quant field.

jasonqinzhou marked this pull request as draft March 12, 2026 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support GLM-5-FP8 in HYBRID mode#406

feat: support GLM-5-FP8 in HYBRID mode#406
jasonqinzhou wants to merge 4 commits intomainfrom
jasonzho/GLM5

jasonqinzhou commented Feb 20, 2026 •

edited

Loading

Uh oh!

tianhaox commented Feb 21, 2026

Uh oh!

tianhaox Feb 28, 2026

Uh oh!

tianhaox Feb 28, 2026

Uh oh!

tianhaox Feb 28, 2026

Uh oh!

tianhaox Feb 28, 2026

Uh oh!

tianhaox commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jasonqinzhou commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianhaox commented Feb 21, 2026

Uh oh!

tianhaox Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

tianhaox Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

tianhaox Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

tianhaox Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

tianhaox commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jasonqinzhou commented Feb 20, 2026 •

edited

Loading