Skip to content

feat: support GLM-5-FP8 in HYBRID mode#406

Draft
jasonqinzhou wants to merge 4 commits intomainfrom
jasonzho/GLM5
Draft

feat: support GLM-5-FP8 in HYBRID mode#406
jasonqinzhou wants to merge 4 commits intomainfrom
jasonzho/GLM5

Conversation

@jasonqinzhou
Copy link
Contributor

@jasonqinzhou jasonqinzhou commented Feb 20, 2026

All GLM-5-specific MLA params are correctly read. Here's a summary of what was done:

New files:

  • model_configs/zai-org--GLM-5-FP8_config.json — 78 layers, 64 heads, 256 experts, MLA with q_lora_rank=2048,
    qk_nope=192, v_head=256

common.py:

  • Added DeepSeekMLAConfig dataclass (5 MLA params)
  • Registered GlmMoeDsaForCausalLM → "DEEPSEEK" family
  • Added zai-org/GLM-5-FP8 to DefaultHFModels

utils.py:

  • For any DEEPSEEK-family architecture with kv_lora_rank in config, populate extra_params as DeepSeekMLAConfig

models.py:

  • All 3 DEEPSEEK model classes (DeepSeekModel, TrtllmWideEPDeepSeekModel, WideEPDeepSeekModel) now accept mla_config as
    a parameter
  • Replaced all 7 hardcoded MLA constants in DeepSeekModel and TrtllmWideEPDeepSeekModel with expressions derived from
    mla_config (defaulting to DeepSeek-V3 values for backward compat)
  ┌─────────────────────────┬───────────────────────────────┬────────────────────────────────────┐                       
  │          Model          │         Architecture          │             Supported?             │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤                       
  │ GLM-5-FP8               │ GlmMoeDsaForCausalLM          │ ✅ Yes                             │                       
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.7-Flash           │ Glm4MoeLiteForCausalLM        │ ✅ Yes (just added)                │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.7-FP8             │ Glm4MoeForCausalLM            │ ✅ Yes (just added)                │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.6V-Flash-MLX-8bit │ Glm4vForConditionalGeneration │ ❌ No (vision + MLX/Apple Silicon) │
  ├─────────────────────────┼───────────────────────────────┼────────────────────────────────────┤
  │ GLM-4.6V-Flash-MLX-6bit │ Glm4vForConditionalGeneration │ ❌ No (vision + MLX/Apple Silicon) │
  └─────────────────────────┴───────────────────────────────┴────────────────────────────────────┘ 

@github-actions github-actions bot added the feat label Feb 20, 2026
@tianhaox
Copy link
Contributor

glm 5 needs DSA, same as DS V3.2

help="Optional end-to-end request latency target (ms). Enables request-latency optimization mode.",
)
parser.add_argument("--prefix", type=int, default=0, help="Prefix cache length. Default to 0.")
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest we remove the wideep support in this pr. Do more complete design in a seperate PR.

@@ -0,0 +1,27 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will suggest we do manual copy paste to avoid illusion.

@@ -0,0 +1,21 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect. without quant field.

@tianhaox
Copy link
Contributor

I think this whole PR needs a better design of DEEPSEEK_V32 model family. DS V32 and GLM-5 will share the model family. It uses DSA+MoE. I suggest we cancel this PR. Redesign after the DSA Op PR I've created. I will start a PR manually to better support this DEEPSEEK_V32 model family

@jasonqinzhou jasonqinzhou marked this pull request as draft March 12, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants